Clear Your Tools

Think token reduction is only about lowering costs by 30%?

⚡ Faster local & cloud LLMs: fewer tokens, less Context Delusion.
🎯 Better results: reduce Context Rot, keep LLM focused on the task.
🧠 More context for you: less Context Bloat, less memory compaction.

Your AI agent sees only the tools relevant to the current user task and intent.

✅ BM25 ranking by default ✅ No API keys required ✅ Works transparently

Claude Code, Claude Code Desktop, Claude Cowork

Codex

Clear Your Tools is a reverse proxy for coding agents such as Claude Code and Codex CLI.

The problem:

Context Rot: Model accuracy and reasoning degrade as input length grows, even when task difficulty stays the same. Removing irrelevant content consistently improves results.
Context Dilution: On the NoLiMa benchmark, 11 of 12 models fell below 50% of their short-context baseline at just 32K tokens.
Context Bloat: Even frontier models lose recall and reasoning quality as context grows into the 32-200K of tokens—and every extra token adds API cost, because providers resend the full history on each turn.
Local inference: Smaller inputs reduce memory pressure and speed up generation on self-hosted models.

Our Proxy sits between the agent and upstream LLM providers (Anthropic-compatible APIs on OpenRouter, Novita, DeepInfra, and others), intercepts each request, and shrinks the tool payload before forwarding it upstream. Can be easily adopted for other harness agents.

Examples of how to run these agents with the proxy can be found in the ./examples/agents directory.

Large MCP catalogs can add tens of thousands of tokens of tool-schema overhead on every turn. Clear Your Tools removes irrelevant tools and trims irrelevant optional parameters while always keeping required fields for tools that stay in the request.

How it works

Agent (Claude Code, etc.)
        │
        ▼
Clear Your Tools proxy  ──► extract user query from messages
        │                   decompose each tool schema
        │                   score / filter with BM25 (default), rerank, or LLM pruning
        │                   recompose pruned tool list
        ▼
Upstream provider (OpenRouter, Anthropic, Novita, …)

On each intercepted request the proxy:

Extracts the user query from the conversation (latest user turn, with message cleanup).
Decomposes tool schemas into a catalog of chunks: each tool root keeps required properties; optional properties are split into separate searchable units.
Runs the pruning pipeline configured in config.yaml. Out of the box the default is bm25 (local, no API keys). After cyt setup, choose between rerank (optionally followed by llm).
Recomposes surviving tools — required properties always remain; only optional properties that look relevant to the query are merged back in.
Forwards the modified request to the upstream provider with the smaller tools array.

Pruning pipelines

Stage	Model (default)	When it runs	What it does
`bm25`	Local BM25 index (`bm25s`)	Default pipeline when no remote pruner is configured; also fallback when rerank/llm fail or tool count is below their thresholds	Scores catalog chunks locally against the user query; no API keys or pruning cost. Indexes are cached under `~/.config/cyt/bm25/`.
`rerank`	Qwen3-Reranker-8B (DeepInfra)	≥ `pruning.policy.minimum_tools` (default 50), after `cyt setup`	Scores every catalog chunk against the user query; drops low-scoring tools and optional props.
`llm`	Mercury 2 or GPT-OSS-120B (OpenRouter)	≥ `pruning.policy.minimum_tools` (default 50), after `rerank`	LLM selects which catalog chunks to keep; can remove entire tools more aggressively.

Tool Recommendations:

Getting started / no setup — the default bm25 pipeline works out of the box with no remote API keys.
50+ tools — run cyt setup and use rerank or llm. Rerank can be pipelined into LLM as a second stage (pipeline: [rerank, llm]) for stronger tool-level filtering on large catalogs.

Pipeline & Model Recommendations: Choose your pipeline based on model cost:

Expensive models (≥$3/M input tokens, e.g. Sonnet): Use an LLM pruner pipeline.
Cheap models ($0.10–$1/M input tokens, e.g. Haiku, Gemini 3 Flash): Use a rerank pipeline with a low-cost model.
Premium models (e.g. Opus): Use an LLM pruner + rerank combined pipeline.

Supported platforms

Clear Your Tools and the cyt-indexer SDKs support Windows, macOS, and Linux.

SDK & CLI

All language bindings wrap the same Rust core: decompose tool schemas into searchable catalog chunks, then recompose tools from a survivor list. See cyt-indexer-cli.sh

clear-your-tools (PyPI)

Python SDK (import cyt) and CLI (cyt: proxy / pruners)

cyt-indexer-sdk (PyPI)

Python SDK (cyt_indexer)

cyt-indexer-sdk (npm)

TypeScript SDK

cyt-indexer (crates.io)

Rust library and CLI (build / retrieve)

Quick start

Requires uv tool. Install uv

1. Install proxy

From PyPI (proxy + pruners):

uv tool install 'clear-your-tools[all]'

2. Launch an agent

One-command jump-start through the proxy:

cyt launch -- claude
cyt launch -- codex

# 3rd party providers
export ANTHROPIC_DEFAULT_HAIKU_MODEL="google/gemini-3.1-flash-lite"
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
cyt launch --upstream https://openrouter.ai/api -- claude --model haiku

cyt launch shares the same upstream and credential bootstrap as cyt proxy, starts the proxy if needed, prints a manual recipe to stderr (suppress with --quiet), then execs the agent.

For Codex, cyt launch --configure -- codex writes the managed provider block to ~/.codex/config.toml; cyt launch --restore -- codex removes it.

3. Inject relevant skills into the agnet

Why inject skills?

Agents always see a skill's header, but only read the body when they decide it's relevant. If your question fits the body but not the header, the agent may miss the skill — and you end up telling it to read the file yourself.

The CYT injects the matching parts of skills automatically — we call these skinny skills. See SKINNY_SKILLS.md for how it work.

If you prefer, you can use agent hooks instead; that path is separate from the proxy.

cyt hook

View pruning stats savings

cyt stats

# Optional (recommended):
cyt setup
cyt stats --add

Stats are stored in ~/.config/cyt/stats.db by default.

Run the proxy (optional)

Installed CLI:

cyt proxy --upstream https://api.anthropic.com
# Or
cyt proxy --upstream https://api.openai.com

# 3rd party provider
cyt proxy --upstream https://openrouter.ai/api --upstream-kind anthropic

Canonical upstream URLs infer --upstream-kind automatically. For other providers (e.g. OpenRouter), pass --upstream-kind explicitly.

Default listen port: 8834 (from bundled defaults.yaml or ~/.config/cyt/config.yaml).

Run the agent manually (optional)

Point the agent at the proxy (default port 8834). More examples are in ./examples/agents.

Codex (OpenAI Responses API via the proxy):

PORT=8834
codex \
    -c 'model_provider="cyt"' \
    -c 'model_providers.cyt.name="Clear-Your-Tools-Proxy/"' \
    -c "model_providers.cyt.base_url=\"http://127.0.0.1:${PORT}/openai/v1\"" \
    -c 'model_providers.cyt.wire_api="responses"'

Claude Code (Anthropic-compatible API):

PORT=8834
export ANTHROPIC_BASE_URL="http://localhost:${PORT}/anthropic"
claude

Configure the proxy (optional)

Interactive wizard (writes ~/.config/cyt/config.yaml and optionally ~/.config/cyt/.env):

cyt setup

Or edit ~/.config/cyt/config.yaml manually — see CONFIG.md.

Without cyt setup, the proxy uses the default BM25 pipeline — local pruning with no remote API keys. Run cyt setup to configure rerank/llm pruners and full cost tracking.

FAQ

Doesn't pruning burn more tokens than it saves?

The default is BM25 algorithm running locally on your computer it is free. The reranker and weak LLM used for pruning are much cheaper per token than the main model (e.g. Claude Sonnet). You may spend extra tokens on pruning, but they cost a fraction of what you save on the main request. Set input_cost_per_token and output_cost_per_token in ~/.config/cyt/config.yaml to track savings.

Example pricing (input tokens):

Model	Cost per 1M input tokens
Claude Sonnet 4.6	$3.00
Qwen-Reranker-8B	$0.050
GPT-OSS-120B	$0.14
Inception Mercury 2	$0.25

The weak models such as Mercury 2 or GPT-OSS-120B returns only the IDs of tools to keep, so its output stays extremely small. Rerankers do not count output tokens and are usually much cheaper than a strong LLM.

Rule of thumb: saving 1M Sonnet input tokens is still worthwhile even if pruning uses up to ~10M Mercury tokens — roughly a 1:10 cost ratio. The reranker has roughly a 1:60 cost ratio.

In practice, pruning usually adds modest overhead. Worst case (no tools pruned), you might pay ~$3.30 instead of $3.00. With typical pruning (40–95% of tool tokens removed), tool-schema cost drops from ~$3.00 to roughly $0.15–$1.80, plus ~$0.30 for pruning — about $0.45–$2.10 total for tool-related cost, or roughly 30–85% savings depending on policy.

Why don't I see 30–85% savings on my total request?

Those numbers apply to tool schemas only of the input tokens only, not the full prompt (system message, conversation history, user message, etc.).
Clear Your Tools prunes tools based on the user request; the rest of the request is unchanged. Codex agent has an efficient tool use and CYT saves less tokens.
CYT by default injects relevant skills consuming some of the savings produced by the tool pruners. To disable run cyt steup or set skills.enabled: false in ~/.config/cyt/config.yaml
If you have no MCP tools and only the agent's built-in system tools, there is less to prune — expect lower overall savings, typically around 10–20%.

How much you save overall depends on:

How many tools you have — more MCP servers mean a larger share of the request is tool schemas. We do not recommend using CYT below 50 tools.
Which pruning policy you use — see Pruning policies.

To estimate savings on a captured request JSON, see DEV.md. To see statistics of actual net savings (input tokens) run:

cyt stats

With ~100 tools and prune_all, expect ~85–95% savings on tool tokens and typically ~30%+ savings on the full request. The more tools you have the more overall savings you'll see.

Where can I see how many tools and parameters an MCP server has?

The popular Fetch MCP server is a good example. On its Tools tab: 4 tools, each with 4 parameters (1 required, 3 optional) — 16 parameters total.

If the user asks to "fetch the Markdown of a webpage", the prune_all typically keeps only the Fetch Markdown tool with its required parameter plus any optional parameters that look relevant. Unrelated tools (e.g. Read file) are dropped entirely.

Is my provider/model supported?

CYT's pruner models (the cheap reranker and LLM that decide which tools to keep) call providers through LiteLLM. If LiteLLM supports your provider and model, you can use them in CYT.

When you run cyt setup and add a pruner model, you'll be prompted for:

Provider — LiteLLM provider route, without a trailing slash (e.g. openai, openrouter).
Model name — LiteLLM model string (see the provider docs).
API key env var — the name of the environment variable that holds your key, not the key itself (e.g. OPENAI_API_KEY, OPENROUTER_API_KEY).
domain_match — hostname from the provider's API base URL (e.g. openai.com for OpenAI, openrouter.ai for OpenRouter). Used to match outgoing requests to the right model config.

Claude Code reports ZlibError when using the proxy

Install missing zlib:

npm install -g zlib
brew install zlib

This usually means the proxy returned a Content-Encoding: gzip (or deflate) header with a body that was already decompressed. Claude Code’s fetch then tries to inflate plain JSON/SSE and fails. It is not a missing zlib install on your machine or in CYT.

Fix: upgrade to a cyt build that streams upstream bytes unchanged (aiter_raw pass-through). After upgrading, verify:

curl --raw -sS -D - -o /tmp/cyt-msg.body \
  -H 'Accept-Encoding: gzip' \
  ... # your POST to http://127.0.0.1:8834/anthropic/v1/messages
head -c 4 /tmp/cyt-msg.body | xxd   # should show 1f8b when header says gzip

Also check: ANTHROPIC_BASE_URL must use http:// for the default plain-HTTP server, e.g. http://localhost:8834/anthropic. Using https:// against cyt proxy (without TLS/http2.serve) causes uvicorn’s Invalid HTTP request received and broken API calls.

Uvicorn logs Invalid HTTP request received

cyt proxy listens for HTTP/1.1 on the configured port (default 8834). This warning almost always means a client connected with the wrong protocol:

https://localhost:8834 while the proxy is plain HTTP → TLS handshake bytes, not HTTP
HTTP/2 prior knowledge to uvicorn (use http2.serve + TLS certs only if you intend HTTPS)

Use http://localhost:8834/anthropic unless you have enabled Hypercorn TLS in config.

Should I use .env

We strongly recommend storing API keys via cyt setup (uses the macOS Keychain cyt service through the Python keyring backend). Shell exports and ~/.config/cyt/.env also work.

cyt setup   # interactive; stores keys in Keychain service "cyt"

# Optional: inspect or seed Keychain manually (service must be "cyt", not a custom name)
security find-generic-password -s "cyt" -a "__credentials__" -w

Development

See DEV.md for checkout setup, repository layout, library usage, and configuration reference.

Limitations

See LIMITATIONS.md for deployment constraints, token accounting caveats, and MCP aggregator trade-offs.

Debug

See details to debug pruning in debug/.

License

Inspiration

This project is inspired by the ideas explored in the tool-attention project, particularly around improving tool selection efficiency and reducing unnecessary tool exposure to the model.

It also aims to limit the effects of context rot by pruning irrelevant or confusing tools from the available toolset based on the current user prompt and execution context.

Reducing irrelevant tools helps decrease prompt noise, lowers cognitive load on the model, and can improve tool selection accuracy and overall agent reliability.

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 466 Commits
.ast-grep		.ast-grep
.github/workflows		.github/workflows
assets		assets
debug		debug
examples		examples
scripts		scripts
sdk		sdk
search		search
src		src
typings		typings
ui		ui
.cursorignore		.cursorignore
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.markdownlint-cli2.yaml		.markdownlint-cli2.yaml
.markdownlintignore		.markdownlintignore
.osgrepignore		.osgrepignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONFIG.md		CONFIG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEV.md		DEV.md
LICENSE		LICENSE
LIMITATIONS.md		LIMITATIONS.md
README.md		README.md
SKINNY_SKILLS.md		SKINNY_SKILLS.md
Taskfile.yml		Taskfile.yml
commitlint.config.js		commitlint.config.js
committed.toml		committed.toml
count_request_tokens.py		count_request_tokens.py
deny.toml		deny.toml
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sgconfig.yml		sgconfig.yml
typos.toml		typos.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clear Your Tools

How it works

Supported platforms

Quick start

1. Install proxy

2. Launch an agent

3. Inject relevant skills into the agnet

View pruning stats savings

FAQ

Development

Limitations

Debug

License

About

Uh oh!

Releases 57

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clear Your Tools

How it works

Supported platforms

Quick start

1. Install proxy

2. Launch an agent

3. Inject relevant skills into the agnet

View pruning stats savings

FAQ

Development

Limitations

Debug

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 57

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages