A token-native interlingua for LLM-to-LLM communication. More compressed AND more precise than any human language, measured in real tokenizer tokens, not characters.
Canonical home: https://tokenese.org/ Spec: spec.md v0.3 (current). Grammar: GRAMMAR-v0.3.md. Vision: INTENT.md Assistant guide: assistant-guide.txt (GuideCheck human-verifiable-assistant-guide profile 0.6.0, Level 4): a bounded, approval-gated guide for an assistant to install Tokenese and reproduce the audit. Verify before acting at https://guidecheck.org/verify
LLM-to-LLM communication defaults to verbose human prose, which wastes tokens and loses precision; Tokenese gives agents a token-native interlingua whose lexicon is admitted only when each symbol survives a reproducible cross-tokenizer audit.
Teams building multi-agent systems who want machine-to-machine messages that are more compressed and more precise than natural language, with every vocabulary symbol verified by a reproducible tokenizer audit rather than asserted.
LLMs conforming to human language is like watching film in black and white. Human languages carry overhead shaped by human constraints: serial speech, social hedging, redundancy against noisy air. Tokenese brings color to machine-to-machine communication: richer, more informative exchanges compressed into a smaller, token-native format.
- Token-space only. Plain text crosses the wire; each party tokenizes independently. No embeddings, no shared latents, no vendor lock.
- Tokenizer-audited lexicon. A symbol enters the vocabulary only if it costs 1 token, worst case, in every certified tokenizer (currently OpenAI o200k_base + Anthropic). Audit scripts included; claims are reproducible.
- Compression from structure, not glyphs. Fixed field grammar, controlled vocabulary, in-band symbol table for repeated referents. Empirical finding: common English words are already optimal tokens; exotic Unicode usually is not.
- Self-repairing.
??misparse signal and a plain-English escape hatch are mandatory.
A measured example is pending. The previous illustrative example was removed on 2026-06-18: token-counting on the certified tokenizers contradicted its compression claim (the Tokenese form was larger than the English, not smaller). A replacement will ship only with reproducible token counts on every certified tokenizer, measured against terse English rather than verbose prose. See the spec "Example exchange" section and the changelog for the finding.
- Translator + scorer: tools/translator/ - base Tokenese->English translator (originally built for Turnfile) plus the deterministic TKAB per-pair scorer for the W1+L1 mini-pilot.
- CLI:
tokenese-check --pair fixture.json --prettyafterpip install -e tools/translator. - MCP server:
python -m tokenese_translator.mcp_serverexposes parse / validate / validate_framesets / to-english / check-pair / score-pair tools. - Frameset registry: framesets.json - report-only typed slot signatures for common ops. The registry feeds structural telemetry without changing parser acceptance or checker outcomes.
- Conformance: the checker reports mismatches (R5.3); it never generates or repairs. See CONFORMANCE.md.
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
.venv/bin/python audit_symbols.py
ANTHROPIC_API_KEY=... .venv/bin/python audit_anthropic.py
Grammar v0.3 current. Release v0.3.9 is a patch tooling release: the seven-column tokenizer audit is complete (the Gemma column is now native Gemma 4 E4B, the on-device PAICE production generator). The base translator, golden fixtures, TKAB deterministic scorer, grammar-v0.3 features, MCP smoke tests, compression/hypothesis evals, N2 static package report, and report-only frameset registry pass 156/156 tests; the repo-root security and Gemma 4 audit-surface tests add 7 more (163 total). The deterministic N2 receiver static floor now passes the 0.75 threshold after retiring the stale S1 semantic-neighborhood operator form. A cross-surface portable skill ships at skills/tokenese/. The hosted assistant guide is anchored at GuideCheck Level 4 via a DNS TXT record at _assistant-guide.tokenese.org, with a daily drift-detection CI job. The validating A/B experiment between Claude and Codex remains the open downstream measurement; see tools/translator/tkab/AUDIT_CARD.md.
Contributions welcome; every change passes the admission criteria in INTENT.md. See CONTRIBUTING.md. The short version: claims must be measured, not asserted.
Code: MIT. Specification text: CC BY 4.0. See LICENSE and LICENSE-SPEC.
Coding agents should read AGENTS.md first.
Last updated: 2026-06-17