Don't let MCP eat your Agent's Context.

Published on March 24, 2026 · 9 min read · 35 views

Five Layers MCP CLI Context Management AI Tool Chain

Organizational Decision Record

Don't let MCP eat it.

Your Agent's Context

199 tools, 40K tokens per schema round-until we cut 90% off with a five-tier architecture.
Service → SDK → CLI → MCP → Skill, so that theAI Agent Do the most with the least token.

Pain Points

What's our problem?#

It's not about thinking of five layers and then realizing them - it's about three real pain points.natural evolutionCome on out.

Token Burst#

Playwright MCP consumes a large number of tokens per call. even when not in use.The tool schema still occupies the context window(math.) genuseach roundThey're all paying their bills.

After replacing it with the Playwright CLI:

-98% Token usage

MCP Process Explosion#

Every time a Claude Code session is opened, all MCP servers processEquivalent ReproductionThe

            3 sessions ×23 MCP servers

            = 69 Resident Programs

            // RAM and CPU linearly full

Tool. Too many to choose from.#

When there are too many MCP server tools, LLM'sDecreased accuracy of tool selectionThe

Industry Best Practices: Every MCP Server≤ 10 toolsThe

8 tools × 200 tokens/schema =1,600 tokens/turn(with or without)

Data

Token Cost Comparison#

Token overhead per round for different call types (excluding actual I/O)

1,600

MCP (8 tools)
Schema Expenses per Round (tokens)

CLI via Bash
Schema Expenses per Round (tokens)

80K

50 rounds of dialogues
Cumulative waste in unused schema

Cloudflare Code Mode
Only 2 tools for the whole IDE

Organizational structure

What do the five levels do?#

Each tier serves a different consumer and is relied upon from top to bottom.

skill

Intent Routing Layer - determines whether to go to the CLI or MCP based on the type of operation, and is the entry point for AI workflows

🤖 AI Scheduler / Automated Processes

MCP

Write operation + typed schema return--LLM requires JSON schema to validate inputs

🤖 Claude Code (tool call)

CLI

Common interface for read operations - zero schema overhead, anything that can run a shell will work!

👤🤖📋 People / AI / Cron / Script

SDK

The only HTTP boundary package - both the CLI and MCP import the SDK and don't touch HTTP.

🐍 Python Program / Notebook

Service

Business Logic Ontology - FastAPI routes + business logic + DB

⚙️ Inter-module internal call

Core Principles:Bug Fix Automatically Propagates—Fixing an SDK

        // Dependency direction (upper import lower)

         skill ──invoke──►CLI (Reading) /MCP (Write)

        MCP  ──import─► SDK ──HTTP──► Service

        CLI  ──import─► SDK ──HTTP──► Service

        // The SDK is the only place that touches HTTP.

        // Change SDK = Synchronized update for all upper tier consumers

Consumer Matrix

Why can't I have one less layer?#

Consumers	Optimal Access Layer	rationale
Other Core Module	Service	Call services.py directly in the same process with zero overhead
Python Script / Notebook	SDK	Programmatic access, complete error handling
People (Terminal)	CLI	`--help` Discoverable, pipe combinable
AI Coding Agent Claude Code / Codex CLI / Gemini CLI	CLI (Reading) /MCP (Write)	Read Bash = zero schema; write requires typed input
Scheduling System	CLI	The shell command is the common language for scheduling
AI workflow	skill	Advanced map routing with automatic selection of CLI or MCP

"CLI is a zero overhead universal interface - anything that can run a shell will work.
MCP can only be used by clients that support the MCP protocol."

Evolutionary Process

How did we get here?#

It's not pre-designed - it's evolved step-by-step from pain points.

Discovery Period - Playwright MCP

First Lesson of Token Burst#

Playwright MCP's tool schema is extremely large, injecting

Leadership:If you can go CLI, you shouldn't go MCP.

Extension - MCP Process Explosion

Multiplying Disasters in Multiple Sessions#

As the number of MCP servers grows to 23, each additional Claude

Leadership:Reducing the number of MCP servers/tools is necessary.

Validation Period - Cloudflare Code Mode

The industry's most extreme tool compression#

Code Mode for Cloudflare MCP Exposed Only2 tools::search() cap (a poem)execute()All other operations are written in real time in the sandbox. All other operations are written in the sandbox in real time. This proves that the number of tools is inversely proportional to their capacity.

Leadership:We don't go that extreme, but the same principle applies - the fewer tools the better.

Maturity - battle-tested

The new module completes all five levels#

Backend → SDK → CLI → MCP → Skill

Leadership:Five layers is not a burden, it's compound interest.

Sandbox Pattern

Disposable Tools - Sandbox Patching#

Not all operations require a permanent tool, and with the Cloudflare Code Mode, batch operations are written in the sandbox on-the-fly and do not occupy the MCP tool slot.

Permanent Tool (5-layer cover)#

High-frequency, stable, discoverability-required operations

search - Daily high-frequency use
create - Requires typed schema validation
render - Complex Parameter Combinations

✓ Worth the cost of maintenance ✓ With --help ✓ Testable

Disposable Tool (Sandbox)#

Low-frequency, one-time, or exploratory batch operations

Batch scanning + aggregation (3+ API calls)
One-time Data Conversion
Exploratory Scripts (use it or lose it)

✓ Zero schema overhead ✓ No tool slot ✓ Write-and-use

        // Extreme vs Pragmatic Solutions for Cloudflare

        Cloudflare: 2 permanent tools + sandbox write all operations in real time

        pragmatist:  ≤10 permanent tools/server + sandbox supplemental low-frequency operation

        // Common principle: permanent tools are a scarce resource, use sandbox if they're not worth the space.

Judgment Criteria:Is an operation worth a permanent tool?
→ used ≥ 3 times per week + complex combination of parameters + requires discoverability → 5 levels of coverage
→ No → sandbox instant write, use it or lose it.

Comparison

If you don't use five layers?#

alternative	Question	consequences
Only Service	HTTP calls only	AI agent, cron, and CLI users all have to write their own HTTP clients.
Service + MCP	Token overhead + process explosion	23 servers × 3 sessions = 69 processes, schema eats tokens every round
Service + CLI	Write operation missing typed schema	LLM error parameter, no input validation.
All go MCP.	CLI Users Excluded	Codex CLI, Gemini CLI, and cron are all inaccessible.
Independent implementation of HTTP for each layer	No SDK unification	Ghost Parameter anti-patterns abound.

Practical Examples

Control Tool injection yourself with the Agent SDK#

If you build your own client with the Agent SDK, you can add a new client in theApplication LayerFull control over which tool schemas go into the LLM context - no need to wait for platform support.Speakeasy's Dynamic Toolset ProgramThis is the approach that achieves 96% token reduction.

        # Virtual Code Schematic

        all_tools = mcp_server.tools_list()    # 199 received (protocol level)

        relevant = bm25_filter(all_tools, query) ;# You've filtered yourself down to five

        response = client.messages.create(

            tools=relevant,    &# Send only 5 to LLM

        )

No filtering (default behavior)#

199 tool schemas all stuffedtools[]

~40K tokens/round

BM25 After filtering#

Only the most relevant 5-10 tools will be delivered at a time.

~2K tokens/rounds

Key Distinctions:MCP protocol layertools/list Return all tools - this can't be changed.
But your client decides what to put into the API.tools[]--ThisTotally under your control.The
Use an off-the-shelf client such as Claude Code, which does all the stuffing for you. Build your own client?

cost analysis

The actual cost is actually very low.#

SDK#

inheritanceBaseClientThe following are some of the methods that can be packaged in one line per method. Error handling, authentication, and base paths are all DRY.

CLI#

Command wrappers for importing into the SDK. Unified signatures, paging with a common parser.

MCP#

SDK package + tool definition. Only write operations + structured returns are included, read operations are not required.

"Marginal costs diminish, and composability is a long-term benefit - but so are maintenance costs."

⚠ The cost of this approach#

Multi-maintenance three packages#

SDK, CLI, and MCP are all wrapped in the same logic, and when the API is changed, all three layers have to be changed as well. When the team is small, it can be controlled, but when there are more people, we have to rely on automated testing to cover it.

Debugging paths become longer#

When a bug occurs, we need to determine which layer the problem lies in: whether it's a service logic error, an SDK parameter omission, a CLI parsing error, or the MCP schema is not updated. The more layers there are, the slower the troubleshooting will be.

Not all modules are worth it.#

Gadgets for internal use don't need five layers. Only core modules that are used by multiple consumers are worth investing in. Over-application is a burden.

External Verification

Winds of Change in the OpenCLI Community#

It's not the direction we're yelling at ourselves--The entire open source community is moving towards the CLI.The

OpenCLI Convert web and desktop applications to a standardized CLI;CLI-Anything All software should be able to be operated by Agent via CLI. More and more developers are asking "is there a CLI" instead of "is there an API".

The reason is straightforward:The whole industry is entering the Agentic CLI era.The AI coding tools will be able to run shells. When AI coding tools are all running shells.The CLI is naturally the largest cross-tool convention.The

previously#

Mainstreaming practices:Agent + MCP

All tools go MCP, schema is stuffed with context, and the more tools you have, the slower you get.

Now.#

Evolutionary practices:Agent + Skill (calling CLI / MCP)

Skill on-demand routing - CLI zero overhead for read operations, MCP for write operations

It's not the CLI that's revived - it's the AI era that's rediscovering the value of the CLI.

Conclusion

199 Tools, Set of Logic#

From the first lesson of Token explosion to the community wind direction verification - each step is forced by pain points.
The five tiers are not bureaucrats.People, Scripts, and AI each taking what he needs.Change one place, synchronize them all.The

        // Decision Links

        Pain Point 1: Playwright MCP token burst

          →CLI Save 98% token

        Pain Point 2: Multiple CCs → Increase progress proportionally.

          →mcpproxy Aggregation + CLI Unload Read Operation

        Pain Point 3: Tool Too many LLMs to choose from.

          → ≤10 tools/server +sandbox bye-bye

        Verification 1: Cloudflare runs an entire IDE with only 2 tools.

          → Same principle, we are morepragmatic

        Verification 2: OpenCLI community to CLI-first

          → AI era, CLI is theMaximum number of cross-instrumental conventions

        // Summarize

        Service → SDK → CLI → MCP → Skill

        199 Tools, One Logic, Three Consumers

Take this away.

Copy to your AI Agent#

Paste the following paragraph to your AI coding agent, so that it can help you evaluate the existing architecture and plan the direction of transformation.

Help me evaluate the current AI tool integration methods in the project, to see if there's room to optimize token expenditure.

Background: MCP tool schemas are injected into the context in every round, consuming tokens whether they are used or not. When the number of tools is large, the schema alone can consume tens of thousands of tokens

References

Extended Reading#

These are the resources that we have actually made reference to and borrowed in the course of our exploration.

writings	Why is it important?
Code Mode: Give Agents an Entire API in 1,000 Tokens	Cloudflare tested using only 2 tools to run the whole IDE, and the token expense was reduced by 81%. This proves the principle of "the fewer the tools, the better".
Your MCP Server Is Eating Your Context Window	Apideck analyzes how MCP schema eats up context
Advanced Tool Use - Anthropic Engineering	Anthropic officially admitted the problem of tool schema occupying context, and launched Tool Search mechanism to realize 85% context reduction.
How We Reduced Token Usage by 100x: Dynamic Toolsets	Speakeasy's Dynamic Toolset Solution for 96% Input Token Reduction
MCPProxy - MCP Gateway	Open source MCP aggregation gateway with BM25 indexed dynamic discovery tools, hundreds of servers + thousands of tools supported.
Welcome to the Agentic CLI Era	The New Stack reports on the industry trend of AI coding tools shifting to CLI.
OpenCLI - Universal CLI Hub	Universal CLI Center for AI Agents, converting websites and desktop applications into standardized CLIs.
CLI-Anything - Making ALL Software Agent-Native	HKU Research: Enabling all software to be operated by AI Agent via CLI

Concepts mentioned

Context Management Five Layers Skill Framework cost control Token Optimization MCP Modular Design SDK Packaging

Knowledge Graph

✦ Copy Prompt

Don't let MCP eat it.

Your Agent's Context

What's our problem?#

Token Burst#

MCP Process Explosion#

Tool. Too many to choose from.#

Token Cost Comparison#

What do the five levels do?#

Why can't I have one less layer?#

How did we get here?#

First Lesson of Token Burst#

Multiplying Disasters in Multiple Sessions#

The industry's most extreme tool compression#

The new module completes all five levels#

Disposable Tools - Sandbox Patching#

Permanent Tool (5-layer cover)#

Disposable Tool (Sandbox)#

If you don't use five layers?#

Control Tool injection yourself with the Agent SDK#

No filtering (default behavior)#

BM25 After filtering#

The actual cost is actually very low.#

SDK#

CLI#

MCP#

⚠ The cost of this approach#

Multi-maintenance three packages#

Debugging paths become longer#

Not all modules are worth it.#

Winds of Change in the OpenCLI Community#

previously#

Now.#

199 Tools, Set of Logic#

Copy to your AI Agent#

Extended Reading#

Recommended

The day the AI decided to "clean up," the project was gone.

Which one are you? Three paths of an AI-era engineer

tmux from generic status bar, split screen, remote to multi-agent division of labor

Concepts mentioned

Knowledge Graph