Don't let MCP eat it.
Your Agent's Context
199 tools, 40K tokens per schema round-until we cut 90% off with a five-tier architecture.
Service → SDK → CLI → MCP → Skill, so that theAI Agent Do the most with the least token.
What's our problem?#
It's not about thinking of five layers and then realizing them - it's about three real pain points.natural evolutionCome on out.
Token Burst#
Playwright MCP consumes a large number of tokens per call. even when not in use.The tool schema still occupies the context window(math.) genuseach roundThey're all paying their bills.
After replacing it with the Playwright CLI:
MCP Process Explosion#
Every time a Claude Code session is opened, all MCP servers processEquivalent ReproductionThe
= 69 Resident Programs
// RAM and CPU linearly full
Tool. Too many to choose from.#
When there are too many MCP server tools, LLM'sDecreased accuracy of tool selectionThe
Industry Best Practices: Every MCP Server≤ 10 toolsThe
8 tools × 200 tokens/schema =1,600 tokens/turn(with or without)
Token Cost Comparison#
Token overhead per round for different call types (excluding actual I/O)
Schema Expenses per Round (tokens)
Schema Expenses per Round (tokens)
Cumulative waste in unused schema
Only 2 tools for the whole IDE
What do the five levels do?#
Each tier serves a different consumer and is relied upon from top to bottom.
Core Principles:Bug Fix Automatically Propagates—Fixing an SDK
skill ──invoke──►CLI (Reading) /MCP (Write)
MCP ──import─► SDK ──HTTP──► Service
CLI ──import─► SDK ──HTTP──► Service
// The SDK is the only place that touches HTTP.
// Change SDK = Synchronized update for all upper tier consumers
Why can't I have one less layer?#
| Consumers | Optimal Access Layer | rationale |
|---|---|---|
| Other Core Module | Service | Call services.py directly in the same process with zero overhead |
| Python Script / Notebook | SDK | Programmatic access, complete error handling |
| People (Terminal) | CLI | --help Discoverable, pipe combinable |
| AI Coding Agent Claude Code / Codex CLI / Gemini CLI |
CLI (Reading) /MCP (Write) | Read Bash = zero schema; write requires typed input |
| Scheduling System | CLI | The shell command is the common language for scheduling |
| AI workflow | skill | Advanced map routing with automatic selection of CLI or MCP |
"CLI is a zero overhead universal interface - anything that can run a shell will work.
MCP can only be used by clients that support the MCP protocol."
How did we get here?#
It's not pre-designed - it's evolved step-by-step from pain points.
First Lesson of Token Burst#
Playwright MCP's tool schema is extremely large, injecting
Leadership:If you can go CLI, you shouldn't go MCP.
Multiplying Disasters in Multiple Sessions#
As the number of MCP servers grows to 23, each additional Claude
Leadership:Reducing the number of MCP servers/tools is necessary.
The industry's most extreme tool compression#
Code Mode for Cloudflare MCP Exposed Only2 tools::search() cap (a poem)execute()All other operations are written in real time in the sandbox. All other operations are written in the sandbox in real time. This proves that the number of tools is inversely proportional to their capacity.
Leadership:We don't go that extreme, but the same principle applies - the fewer tools the better.
The new module completes all five levels#
Backend → SDK → CLI → MCP → Skill
Leadership:Five layers is not a burden, it's compound interest.
Disposable Tools - Sandbox Patching#
Not all operations require a permanent tool, and with the Cloudflare Code Mode, batch operations are written in the sandbox on-the-fly and do not occupy the MCP tool slot.
Permanent Tool (5-layer cover)#
High-frequency, stable, discoverability-required operations
search- Daily high-frequency usecreate- Requires typed schema validationrender- Complex Parameter Combinations
✓ Worth the cost of maintenance ✓ With --help ✓ Testable
Disposable Tool (Sandbox)#
Low-frequency, one-time, or exploratory batch operations
- Batch scanning + aggregation (3+ API calls)
- One-time Data Conversion
- Exploratory Scripts (use it or lose it)
✓ Zero schema overhead ✓ No tool slot ✓ Write-and-use
Cloudflare: 2 permanent tools + sandbox write all operations in real time
pragmatist: ≤10 permanent tools/server + sandbox supplemental low-frequency operation
// Common principle: permanent tools are a scarce resource, use sandbox if they're not worth the space.
Judgment Criteria:Is an operation worth a permanent tool?
→ used ≥ 3 times per week + complex combination of parameters + requires discoverability → 5 levels of coverage
→ No → sandbox instant write, use it or lose it.
If you don't use five layers?#
| alternative | Question | consequences |
|---|---|---|
| Only Service | HTTP calls only | AI agent, cron, and CLI users all have to write their own HTTP clients. |
| Service + MCP | Token overhead + process explosion | 23 servers × 3 sessions = 69 processes, schema eats tokens every round |
| Service + CLI | Write operation missing typed schema | LLM error parameter, no input validation. |
| All go MCP. | CLI Users Excluded | Codex CLI, Gemini CLI, and cron are all inaccessible. |
| Independent implementation of HTTP for each layer | No SDK unification | Ghost Parameter anti-patterns abound. |
Control Tool injection yourself with the Agent SDK#
If you build your own client with the Agent SDK, you can add a new client in theApplication LayerFull control over which tool schemas go into the LLM context - no need to wait for platform support.Speakeasy's Dynamic Toolset ProgramThis is the approach that achieves 96% token reduction.
all_tools = mcp_server.tools_list() # 199 received (protocol level)
relevant = bm25_filter(all_tools, query) ;# You've filtered yourself down to five
response = client.messages.create(
tools=relevant, &# Send only 5 to LLM
)
BM25 After filtering#
Only the most relevant 5-10 tools will be delivered at a time.
~2K tokens/rounds
Key Distinctions:MCP protocol layertools/list Return all tools - this can't be changed.
But your client decides what to put into the API.tools[]--ThisTotally under your control.The
Use an off-the-shelf client such as Claude Code, which does all the stuffing for you. Build your own client?
The actual cost is actually very low.#
SDK#
inheritanceBaseClientThe following are some of the methods that can be packaged in one line per method. Error handling, authentication, and base paths are all DRY.
CLI#
Command wrappers for importing into the SDK. Unified signatures, paging with a common parser.
MCP#
SDK package + tool definition. Only write operations + structured returns are included, read operations are not required.
"Marginal costs diminish, and composability is a long-term benefit - but so are maintenance costs."
⚠ The cost of this approach#
Multi-maintenance three packages#
SDK, CLI, and MCP are all wrapped in the same logic, and when the API is changed, all three layers have to be changed as well. When the team is small, it can be controlled, but when there are more people, we have to rely on automated testing to cover it.
Debugging paths become longer#
When a bug occurs, we need to determine which layer the problem lies in: whether it's a service logic error, an SDK parameter omission, a CLI parsing error, or the MCP schema is not updated. The more layers there are, the slower the troubleshooting will be.
Not all modules are worth it.#
Gadgets for internal use don't need five layers. Only core modules that are used by multiple consumers are worth investing in. Over-application is a burden.
Winds of Change in the OpenCLI Community#
It's not the direction we're yelling at ourselves--The entire open source community is moving towards the CLI.The
OpenCLI Convert web and desktop applications to a standardized CLI;CLI-Anything All software should be able to be operated by Agent via CLI. More and more developers are asking "is there a CLI" instead of "is there an API".
The reason is straightforward:The whole industry is entering the Agentic CLI era.The AI coding tools will be able to run shells. When AI coding tools are all running shells.The CLI is naturally the largest cross-tool convention.The
previously#
Mainstreaming practices:Agent + MCP
All tools go MCP, schema is stuffed with context, and the more tools you have, the slower you get.
Now.#
Evolutionary practices:Agent + Skill (calling CLI / MCP)
Skill on-demand routing - CLI zero overhead for read operations, MCP for write operations
It's not the CLI that's revived - it's the AI era that's rediscovering the value of the CLI.
199 Tools, Set of Logic#
From the first lesson of Token explosion to the community wind direction verification - each step is forced by pain points.
The five tiers are not bureaucrats.People, Scripts, and AI each taking what he needs.Change one place, synchronize them all.The
Pain Point 1: Playwright MCP token burst
→CLI Save 98% token
Pain Point 2: Multiple CCs → Increase progress proportionally.
→mcpproxy Aggregation + CLI Unload Read Operation
Pain Point 3: Tool Too many LLMs to choose from.
→ ≤10 tools/server +sandbox bye-bye
Verification 1: Cloudflare runs an entire IDE with only 2 tools.
→ Same principle, we are morepragmatic
Verification 2: OpenCLI community to CLI-first
→ AI era, CLI is theMaximum number of cross-instrumental conventions
// Summarize
Service → SDK → CLI → MCP → Skill
199 Tools, One Logic, Three Consumers
Copy to your AI Agent#
Paste the following paragraph to your AI coding agent, so that it can help you evaluate the existing architecture and plan the direction of transformation.
Extended Reading#
These are the resources that we have actually made reference to and borrowed in the course of our exploration.
| writings | Why is it important? |
|---|---|
| Code Mode: Give Agents an Entire API in 1,000 Tokens | Cloudflare tested using only 2 tools to run the whole IDE, and the token expense was reduced by 81%. This proves the principle of "the fewer the tools, the better". |
| Your MCP Server Is Eating Your Context Window | Apideck analyzes how MCP schema eats up context |
| Advanced Tool Use - Anthropic Engineering | Anthropic officially admitted the problem of tool schema occupying context, and launched Tool Search mechanism to realize 85% context reduction. |
| How We Reduced Token Usage by 100x: Dynamic Toolsets | Speakeasy's Dynamic Toolset Solution for 96% Input Token Reduction |
| MCPProxy - MCP Gateway | Open source MCP aggregation gateway with BM25 indexed dynamic discovery tools, hundreds of servers + thousands of tools supported. |
| Welcome to the Agentic CLI Era | The New Stack reports on the industry trend of AI coding tools shifting to CLI. |
| OpenCLI - Universal CLI Hub | Universal CLI Center for AI Agents, converting websites and desktop applications into standardized CLIs. |
| CLI-Anything - Making ALL Software Agent-Native | HKU Research: Enabling all software to be operated by AI Agent via CLI |
Recommended
The day the AI decided to "clean up," the project was gone.
A coworker's AI CLI tool deleted the entire project folder. This incident allowed us to start building a definitive fence.
Which one are you? Three paths of an AI-era engineer
My AI Development Journey from Handwriting to Commanding
tmux from generic status bar, split screen, remote to multi-agent division of labor
February 2026 was inspired by openclaw to start using tmux, from status bar, split screen, remote development all the way to stacking to multi-agent division of labor.