← Blog

Don't let MCP eat your Agent's Context.

· 9 min read · 35 views
Five Layers MCP CLI Context Management AI Tool Chain
Organizational Decision Record

Don't let MCP eat it.

Your Agent's Context

199 tools, 40K tokens per schema round-until we cut 90% off with a five-tier architecture.
Service → SDK → CLI → MCP → Skill, so that theAI Agent Do the most with the least token.

Pain Points

What's our problem?#

It's not about thinking of five layers and then realizing them - it's about three real pain points.natural evolutionCome on out.

1

Token Burst#

Playwright MCP consumes a large number of tokens per call. even when not in use.The tool schema still occupies the context window(math.) genuseach roundThey're all paying their bills.

After replacing it with the Playwright CLI:

-98% Token usage
2

MCP Process Explosion#

Every time a Claude Code session is opened, all MCP servers processEquivalent ReproductionThe

3 sessions ×23 MCP servers
= 69 Resident Programs
// RAM and CPU linearly full
3

Tool. Too many to choose from.#

When there are too many MCP server tools, LLM'sDecreased accuracy of tool selectionThe

Industry Best Practices: Every MCP Server≤ 10 toolsThe

8 tools × 200 tokens/schema =1,600 tokens/turn(with or without)

Data

Token Cost Comparison#

Token overhead per round for different call types (excluding actual I/O)

1,600
MCP (8 tools)
Schema Expenses per Round (tokens)
0
CLI via Bash
Schema Expenses per Round (tokens)
80K
50 rounds of dialogues
Cumulative waste in unused schema
2
Cloudflare Code Mode
Only 2 tools for the whole IDE
Organizational structure

What do the five levels do?#

Each tier serves a different consumer and is relied upon from top to bottom.

skill
Intent Routing Layer - determines whether to go to the CLI or MCP based on the type of operation, and is the entry point for AI workflows
🤖 AI Scheduler / Automated Processes
MCP
Write operation + typed schema return--LLM requires JSON schema to validate inputs
🤖 Claude Code (tool call)
CLI
Common interface for read operations - zero schema overhead, anything that can run a shell will work!
👤🤖📋 People / AI / Cron / Script
SDK
The only HTTP boundary package - both the CLI and MCP import the SDK and don't touch HTTP.
🐍 Python Program / Notebook
Service
Business Logic Ontology - FastAPI routes + business logic + DB
⚙️ Inter-module internal call

Core Principles:Bug Fix Automatically Propagates—Fixing an SDK

// Dependency direction (upper import lower)

skill ──invoke──►CLI (Reading) /MCP (Write)
MCP  ──import─► SDK ──HTTP──► Service
CLI  ──import─► SDK ──HTTP──► Service

// The SDK is the only place that touches HTTP.
// Change SDK = Synchronized update for all upper tier consumers
Consumer Matrix

Why can't I have one less layer?#

Consumers Optimal Access Layer rationale
Other Core Module Service Call services.py directly in the same process with zero overhead
Python Script / Notebook SDK Programmatic access, complete error handling
People (Terminal) CLI --help Discoverable, pipe combinable
AI Coding Agent
Claude Code / Codex CLI / Gemini CLI
CLI (Reading) /MCP (Write) Read Bash = zero schema; write requires typed input
Scheduling System CLI The shell command is the common language for scheduling
AI workflow skill Advanced map routing with automatic selection of CLI or MCP

"CLI is a zero overhead universal interface - anything that can run a shell will work.
MCP can only be used by clients that support the MCP protocol."

Evolutionary Process

How did we get here?#

It's not pre-designed - it's evolved step-by-step from pain points.

Discovery Period - Playwright MCP

First Lesson of Token Burst#

Playwright MCP's tool schema is extremely large, injecting

Leadership:If you can go CLI, you shouldn't go MCP.

Extension - MCP Process Explosion

Multiplying Disasters in Multiple Sessions#

As the number of MCP servers grows to 23, each additional Claude

Leadership:Reducing the number of MCP servers/tools is necessary.

Validation Period - Cloudflare Code Mode

The industry's most extreme tool compression#

Code Mode for Cloudflare MCP Exposed Only2 tools::search() cap (a poem)execute()All other operations are written in real time in the sandbox. All other operations are written in the sandbox in real time. This proves that the number of tools is inversely proportional to their capacity.

Leadership:We don't go that extreme, but the same principle applies - the fewer tools the better.

Maturity - battle-tested

The new module completes all five levels#

Backend → SDK → CLI → MCP → Skill

Leadership:Five layers is not a burden, it's compound interest.

Sandbox Pattern

Disposable Tools - Sandbox Patching#

Not all operations require a permanent tool, and with the Cloudflare Code Mode, batch operations are written in the sandbox on-the-fly and do not occupy the MCP tool slot.

Permanent Tool (5-layer cover)#

High-frequency, stable, discoverability-required operations

  • search - Daily high-frequency use
  • create - Requires typed schema validation
  • render - Complex Parameter Combinations

✓ Worth the cost of maintenance ✓ With --help ✓ Testable

Disposable Tool (Sandbox)#

Low-frequency, one-time, or exploratory batch operations

  • Batch scanning + aggregation (3+ API calls)
  • One-time Data Conversion
  • Exploratory Scripts (use it or lose it)

✓ Zero schema overhead ✓ No tool slot ✓ Write-and-use

// Extreme vs Pragmatic Solutions for Cloudflare

Cloudflare: 2 permanent tools + sandbox write all operations in real time
pragmatist:  ≤10 permanent tools/server + sandbox supplemental low-frequency operation

// Common principle: permanent tools are a scarce resource, use sandbox if they're not worth the space.

Judgment Criteria:Is an operation worth a permanent tool?
→ used ≥ 3 times per week + complex combination of parameters + requires discoverability → 5 levels of coverage
→ No → sandbox instant write, use it or lose it.

Comparison

If you don't use five layers?#

alternative Question consequences
Only Service HTTP calls only AI agent, cron, and CLI users all have to write their own HTTP clients.
Service + MCP Token overhead + process explosion 23 servers × 3 sessions = 69 processes, schema eats tokens every round
Service + CLI Write operation missing typed schema LLM error parameter, no input validation.
All go MCP. CLI Users Excluded Codex CLI, Gemini CLI, and cron are all inaccessible.
Independent implementation of HTTP for each layer No SDK unification Ghost Parameter anti-patterns abound.
Practical Examples

Control Tool injection yourself with the Agent SDK#

If you build your own client with the Agent SDK, you can add a new client in theApplication LayerFull control over which tool schemas go into the LLM context - no need to wait for platform support.Speakeasy's Dynamic Toolset ProgramThis is the approach that achieves 96% token reduction.

# Virtual Code Schematic

all_tools = mcp_server.tools_list()    # 199 received (protocol level)
relevant = bm25_filter(all_tools, query) ;# You've filtered yourself down to five
response = client.messages.create(
    tools=relevant,    &# Send only 5 to LLM
)

No filtering (default behavior)#

199 tool schemas all stuffedtools[]

~40K tokens/round

BM25 After filtering#

Only the most relevant 5-10 tools will be delivered at a time.

~2K tokens/rounds

Key Distinctions:MCP protocol layertools/list Return all tools - this can't be changed.
But your client decides what to put into the API.tools[]--ThisTotally under your control.The
Use an off-the-shelf client such as Claude Code, which does all the stuffing for you. Build your own client?

cost analysis

The actual cost is actually very low.#

SDK#

inheritanceBaseClientThe following are some of the methods that can be packaged in one line per method. Error handling, authentication, and base paths are all DRY.

CLI#

Command wrappers for importing into the SDK. Unified signatures, paging with a common parser.

MCP#

SDK package + tool definition. Only write operations + structured returns are included, read operations are not required.

"Marginal costs diminish, and composability is a long-term benefit - but so are maintenance costs."

⚠ The cost of this approach#

Multi-maintenance three packages#

SDK, CLI, and MCP are all wrapped in the same logic, and when the API is changed, all three layers have to be changed as well. When the team is small, it can be controlled, but when there are more people, we have to rely on automated testing to cover it.

Debugging paths become longer#

When a bug occurs, we need to determine which layer the problem lies in: whether it's a service logic error, an SDK parameter omission, a CLI parsing error, or the MCP schema is not updated. The more layers there are, the slower the troubleshooting will be.

Not all modules are worth it.#

Gadgets for internal use don't need five layers. Only core modules that are used by multiple consumers are worth investing in. Over-application is a burden.

External Verification

Winds of Change in the OpenCLI Community#

It's not the direction we're yelling at ourselves--The entire open source community is moving towards the CLI.The

OpenCLI Convert web and desktop applications to a standardized CLI;CLI-Anything All software should be able to be operated by Agent via CLI. More and more developers are asking "is there a CLI" instead of "is there an API".

The reason is straightforward:The whole industry is entering the Agentic CLI era.The AI coding tools will be able to run shells. When AI coding tools are all running shells.The CLI is naturally the largest cross-tool convention.The

previously#

Mainstreaming practices:Agent + MCP

All tools go MCP, schema is stuffed with context, and the more tools you have, the slower you get.

Now.#

Evolutionary practices:Agent + Skill (calling CLI / MCP)

Skill on-demand routing - CLI zero overhead for read operations, MCP for write operations

It's not the CLI that's revived - it's the AI era that's rediscovering the value of the CLI.

Conclusion

199 Tools, Set of Logic#

From the first lesson of Token explosion to the community wind direction verification - each step is forced by pain points.
The five tiers are not bureaucrats.People, Scripts, and AI each taking what he needs.Change one place, synchronize them all.The

// Decision Links

Pain Point 1: Playwright MCP token burst
  →CLI Save 98% token

Pain Point 2: Multiple CCs → Increase progress proportionally.
  →mcpproxy Aggregation + CLI Unload Read Operation

Pain Point 3: Tool Too many LLMs to choose from.
  → ≤10 tools/server +sandbox bye-bye

Verification 1: Cloudflare runs an entire IDE with only 2 tools.
  → Same principle, we are morepragmatic

Verification 2: OpenCLI community to CLI-first
  → AI era, CLI is theMaximum number of cross-instrumental conventions

// Summarize
Service → SDK → CLI → MCP → Skill
199 Tools, One Logic, Three Consumers
Take this away.

Copy to your AI Agent#

Paste the following paragraph to your AI coding agent, so that it can help you evaluate the existing architecture and plan the direction of transformation.

Help me evaluate the current AI tool integration methods in the project, to see if there's room to optimize token expenditure. Background: MCP tool schemas are injected into the context in every round, consuming tokens whether they are used or not. When the number of tools is large, the schema alone can consume tens of thousands of tokens
References

Extended Reading#

These are the resources that we have actually made reference to and borrowed in the course of our exploration.

writings Why is it important?
Code Mode: Give Agents an Entire API in 1,000 Tokens Cloudflare tested using only 2 tools to run the whole IDE, and the token expense was reduced by 81%. This proves the principle of "the fewer the tools, the better".
Your MCP Server Is Eating Your Context Window Apideck analyzes how MCP schema eats up context
Advanced Tool Use - Anthropic Engineering Anthropic officially admitted the problem of tool schema occupying context, and launched Tool Search mechanism to realize 85% context reduction.
How We Reduced Token Usage by 100x: Dynamic Toolsets Speakeasy's Dynamic Toolset Solution for 96% Input Token Reduction
MCPProxy - MCP Gateway Open source MCP aggregation gateway with BM25 indexed dynamic discovery tools, hundreds of servers + thousands of tools supported.
Welcome to the Agentic CLI Era The New Stack reports on the industry trend of AI coding tools shifting to CLI.
OpenCLI - Universal CLI Hub Universal CLI Center for AI Agents, converting websites and desktop applications into standardized CLIs.
CLI-Anything - Making ALL Software Agent-Native HKU Research: Enabling all software to be operated by AI Agent via CLI
✦ Copy Prompt