|
Think token reduction is only about lowering costs by 30%?
Your AI agent sees only the tools relevant to the current user task and intent. ✅ BM25 ranking by default ✅ No API keys required ✅ Works transparently |
Clear Your Tools is a reverse proxy for coding agents such as Claude Code and Codex CLI.
The problem:
- Context Rot: Model accuracy and reasoning degrade as input length grows, even when task difficulty stays the same. Removing irrelevant content consistently improves results.
- Context Dilution: On the NoLiMa benchmark, 11 of 12 models fell below 50% of their short-context baseline at just 32K tokens.
- Context Bloat: Even frontier models lose recall and reasoning quality as context grows into the 32-200K of tokens—and every extra token adds API cost, because providers resend the full history on each turn.
- Local inference: Smaller inputs reduce memory pressure and speed up generation on self-hosted models.
Our Proxy sits between the agent and upstream LLM providers (Anthropic-compatible APIs on OpenRouter, Novita, DeepInfra, and others), intercepts each request, and shrinks the tool payload before forwarding it upstream. Can be easily adopted for other harness agents.
Examples of how to run these agents with the proxy can be found in the ./examples/agents directory.
Large MCP catalogs can add tens of thousands of tokens of tool-schema overhead on every turn. Clear Your Tools removes irrelevant tools and trims irrelevant optional parameters while always keeping required fields for tools that stay in the request.
Agent (Claude Code, etc.)
│
▼
Clear Your Tools proxy ──► extract user query from messages
│ decompose each tool schema
│ score / filter with BM25 (default), rerank, or LLM pruning
│ recompose pruned tool list
▼
Upstream provider (OpenRouter, Anthropic, Novita, …)
On each intercepted request the proxy:
- Extracts the user query from the conversation (latest user turn, with message cleanup).
- Decomposes tool schemas into a catalog of chunks: each tool root keeps required properties; optional properties are split into separate searchable units.
- Runs the pruning pipeline configured in
config.yaml. Out of the box the default isbm25(local, no API keys). Aftercyt setup, choose betweenrerank(optionally followed byllm). - Recomposes surviving tools — required properties always remain; only optional properties that look relevant to the query are merged back in.
- Forwards the modified request to the upstream provider with the smaller
toolsarray.
Pruning pipelines
| Stage | Model (default) | When it runs | What it does |
|---|---|---|---|
bm25 |
Local BM25 index (bm25s) |
Default pipeline when no remote pruner is configured; also fallback when rerank/llm fail or tool count is below their thresholds | Scores catalog chunks locally against the user query; no API keys or pruning cost. Indexes are cached under ~/.config/cyt/bm25/. |
rerank |
Qwen3-Reranker-8B (DeepInfra) | ≥ pruning.policy.minimum_tools (default 50), after cyt setup |
Scores every catalog chunk against the user query; drops low-scoring tools and optional props. |
llm |
Mercury 2 or GPT-OSS-120B (OpenRouter) | ≥ pruning.policy.minimum_tools (default 50), after rerank |
LLM selects which catalog chunks to keep; can remove entire tools more aggressively. |
Tool Recommendations:
- Getting started / no setup — the default
bm25pipeline works out of the box with no remote API keys. - 50+ tools — run
cyt setupand usererankorllm. Rerank can be pipelined into LLM as a second stage (pipeline: [rerank, llm]) for stronger tool-level filtering on large catalogs.
Pipeline & Model Recommendations: Choose your pipeline based on model cost:
- Expensive models (≥$3/M input tokens, e.g. Sonnet): Use an LLM pruner pipeline.
- Cheap models ($0.10–$1/M input tokens, e.g. Haiku, Gemini 3 Flash): Use a rerank pipeline with a low-cost model.
- Premium models (e.g. Opus): Use an LLM pruner + rerank combined pipeline.
Clear Your Tools and the cyt-indexer SDKs support Windows, macOS, and Linux.
SDK & CLI
All language bindings wrap the same Rust core: decompose tool schemas into searchable catalog chunks, then recompose tools from a survivor list. See cyt-indexer-cli.sh
|
|
Python SDK ( |
|
|
|
Python SDK ( |
|
|
|
TypeScript SDK |
|
|
|
Rust library and CLI ( |
Requires uv tool. Install uv
From PyPI (proxy + pruners):
uv tool install 'clear-your-tools[all]'One-command jump-start through the proxy:
cyt launch -- claude
cyt launch -- codex
# 3rd party providers
export ANTHROPIC_DEFAULT_HAIKU_MODEL="google/gemini-3.1-flash-lite"
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
cyt launch --upstream https://openrouter.ai/api -- claude --model haikucyt launch shares the same upstream and credential bootstrap as cyt proxy, starts the proxy
if needed, prints a manual recipe to stderr (suppress with --quiet), then execs the agent.
For Codex, cyt launch --configure -- codex writes the managed provider block to
~/.codex/config.toml; cyt launch --restore -- codex removes it.
Why inject skills?
Agents always see a skill's header, but only read the body when they decide it's relevant. If your question fits the body but not the header, the agent may miss the skill — and you end up telling it to read the file yourself.
The CYT injects the matching parts of skills automatically — we call these skinny skills. See SKINNY_SKILLS.md for how it work.
If you prefer, you can use agent hooks instead; that path is separate from the proxy.
cyt hookcyt stats
# Optional (recommended):
cyt setup
cyt stats --addStats are stored in ~/.config/cyt/stats.db by default.
Run the proxy (optional)
Installed CLI:
cyt proxy --upstream https://api.anthropic.com
# Or
cyt proxy --upstream https://api.openai.com
# 3rd party provider
cyt proxy --upstream https://openrouter.ai/api --upstream-kind anthropicCanonical upstream URLs infer --upstream-kind automatically. For other providers (e.g. OpenRouter),
pass --upstream-kind explicitly.
Default listen port: 8834 (from bundled defaults.yaml or ~/.config/cyt/config.yaml).
Run the agent manually (optional)
Point the agent at the proxy (default port 8834). More examples are in ./examples/agents.
Codex (OpenAI Responses API via the proxy):
PORT=8834
codex \
-c 'model_provider="cyt"' \
-c 'model_providers.cyt.name="Clear-Your-Tools-Proxy/"' \
-c "model_providers.cyt.base_url=\"http://127.0.0.1:${PORT}/openai/v1\"" \
-c 'model_providers.cyt.wire_api="responses"'Claude Code (Anthropic-compatible API):
PORT=8834
export ANTHROPIC_BASE_URL="http://localhost:${PORT}/anthropic"
claudeConfigure the proxy (optional)
Interactive wizard (writes ~/.config/cyt/config.yaml and optionally ~/.config/cyt/.env):
cyt setupOr edit ~/.config/cyt/config.yaml manually — see CONFIG.md.
Without cyt setup, the proxy uses the default BM25 pipeline — local pruning with no
remote API keys. Run cyt setup to configure rerank/llm pruners and full cost tracking.
Doesn't pruning burn more tokens than it saves?
The default is BM25 algorithm running locally on your computer it is free.
The reranker and weak LLM used for pruning are much cheaper per token than the main model
(e.g. Claude Sonnet). You may spend extra tokens on pruning, but they cost a fraction of what you
save on the main request. Set input_cost_per_token and output_cost_per_token in
~/.config/cyt/config.yaml to track savings.
Example pricing (input tokens):
| Model | Cost per 1M input tokens |
|---|---|
| Claude Sonnet 4.6 | $3.00 |
| Qwen-Reranker-8B | $0.050 |
| GPT-OSS-120B | $0.14 |
| Inception Mercury 2 | $0.25 |
The weak models such as Mercury 2 or GPT-OSS-120B returns only the IDs of tools to keep, so its output stays extremely small. Rerankers do not count output tokens and are usually much cheaper than a strong LLM.
Rule of thumb: saving 1M Sonnet input tokens is still worthwhile even if pruning uses up to ~10M Mercury tokens — roughly a 1:10 cost ratio. The reranker has roughly a 1:60 cost ratio.
In practice, pruning usually adds modest overhead. Worst case (no tools pruned), you might pay ~$3.30 instead of $3.00. With typical pruning (40–95% of tool tokens removed), tool-schema cost drops from ~$3.00 to roughly $0.15–$1.80, plus ~$0.30 for pruning — about $0.45–$2.10 total for tool-related cost, or roughly 30–85% savings depending on policy.
Why don't I see 30–85% savings on my total request?
- Those numbers apply to tool schemas only of the input tokens only, not the full prompt (system message, conversation history, user message, etc.).
- Clear Your Tools prunes tools based on the user request; the rest of the request is unchanged. Codex agent has an efficient tool use and CYT saves less tokens.
- CYT by default injects relevant skills consuming some of the savings produced by the tool pruners. To disable run
cyt steupor setskills.enabled: falsein~/.config/cyt/config.yaml - If you have no MCP tools and only the agent's built-in system tools, there is less to prune — expect lower overall savings, typically around 10–20%.
How much you save overall depends on:
- How many tools you have — more MCP servers mean a larger share of the request is tool schemas. We do not recommend using CYT below 50 tools.
- Which pruning policy you use — see Pruning policies.
To estimate savings on a captured request JSON, see DEV.md.
To see statistics of actual net savings (input tokens) run:
cyt statsWith ~100 tools and prune_all, expect ~85–95% savings on tool tokens and typically ~30%+
savings on the full request. The more tools you have the more overall savings you'll see.
Where can I see how many tools and parameters an MCP server has?
The popular Fetch MCP server is a good example. On its Tools tab: 4 tools, each with 4 parameters (1 required, 3 optional) — 16 parameters total.
If the user asks to "fetch the Markdown of a webpage", the prune_all typically keeps only the
Fetch Markdown tool with its required parameter plus any optional parameters that look
relevant. Unrelated tools (e.g. Read file) are dropped entirely.
Is my provider/model supported?
CYT's pruner models (the cheap reranker and LLM that decide which tools to keep) call providers through LiteLLM. If LiteLLM supports your provider and model, you can use them in CYT.
When you run cyt setup and add a pruner model, you'll be prompted for:
- Provider — LiteLLM provider route, without a trailing slash (e.g.
openai,openrouter). - Model name — LiteLLM model string (see the provider docs).
- API key env var — the name of the environment variable that holds your key,
not the key itself (e.g.
OPENAI_API_KEY,OPENROUTER_API_KEY). - domain_match — hostname from the provider's API base URL (e.g.
openai.comfor OpenAI,openrouter.aifor OpenRouter). Used to match outgoing requests to the right model config.
Claude Code reports ZlibError when using the proxy
Install missing zlib:
npm install -g zlib
brew install zlibThis usually means the proxy returned a Content-Encoding: gzip (or deflate) header with a body
that was already decompressed. Claude Code’s fetch then tries to inflate plain JSON/SSE and fails.
It is not a missing zlib install on your machine or in CYT.
Fix: upgrade to a cyt build that streams upstream bytes unchanged (aiter_raw pass-through).
After upgrading, verify:
curl --raw -sS -D - -o /tmp/cyt-msg.body \
-H 'Accept-Encoding: gzip' \
... # your POST to http://127.0.0.1:8834/anthropic/v1/messages
head -c 4 /tmp/cyt-msg.body | xxd # should show 1f8b when header says gzipAlso check: ANTHROPIC_BASE_URL must use http:// for the default plain-HTTP server,
e.g. http://localhost:8834/anthropic. Using https:// against cyt proxy (without TLS/http2.serve)
causes uvicorn’s Invalid HTTP request received and broken API calls.
Uvicorn logs Invalid HTTP request received
cyt proxy listens for HTTP/1.1 on the configured port (default 8834).
This warning almost always means a client connected with the wrong protocol:
https://localhost:8834while the proxy is plain HTTP → TLS handshake bytes, not HTTP- HTTP/2 prior knowledge to uvicorn (use
http2.serve+ TLS certs only if you intend HTTPS)
Use http://localhost:8834/anthropic unless you have enabled Hypercorn TLS in config.
Should I use .env
We strongly recommend storing API keys via cyt setup (uses the macOS Keychain cyt service through the Python keyring backend). Shell exports and ~/.config/cyt/.env also work.
cyt setup # interactive; stores keys in Keychain service "cyt"
# Optional: inspect or seed Keychain manually (service must be "cyt", not a custom name)
security find-generic-password -s "cyt" -a "__credentials__" -wSee DEV.md for checkout setup, repository layout, library usage, and configuration reference.
See LIMITATIONS.md for deployment constraints, token accounting caveats, and MCP aggregator trade-offs.
See details to debug pruning in debug/.
Inspiration
This project is inspired by the ideas explored in the tool-attention project, particularly around improving tool selection efficiency and reducing unnecessary tool exposure to the model.
It also aims to limit the effects of context rot by pruning irrelevant or confusing tools from the available toolset based on the current user prompt and execution context.
Reducing irrelevant tools helps decrease prompt noise, lowers cognitive load on the model, and can improve tool selection accuracy and overall agent reliability.
See LICENSE.



