Tokenese

A token-native interlingua for LLM-to-LLM communication. More compressed AND more precise than any human language, measured in real tokenizer tokens, not characters.

Canonical home: https://tokenese.org/ Spec: spec.md v0.3 (current). Grammar: GRAMMAR-v0.3.md. Vision: INTENT.md Assistant guide: assistant-guide.txt (GuideCheck human-verifiable-assistant-guide profile 0.6.0, Level 4): a bounded, approval-gated guide for an assistant to install Tokenese and reproduce the audit. Verify before acting at https://guidecheck.org/verify

Canonical URL

https://tokenese.org/

What problem it solves

LLM-to-LLM communication defaults to verbose human prose, which wastes tokens and loses precision; Tokenese gives agents a token-native interlingua whose lexicon is admitted only when each symbol survives a reproducible cross-tokenizer audit.

Who this is for

Teams building multi-agent systems who want machine-to-machine messages that are more compressed and more precise than natural language, with every vocabulary symbol verified by a reproducible tokenizer audit rather than asserted.

Why

LLMs conforming to human language is like watching film in black and white. Human languages carry overhead shaped by human constraints: serial speech, social hedging, redundancy against noisy air. Tokenese brings color to machine-to-machine communication: richer, more informative exchanges compressed into a smaller, token-native format.

How it works

Token-space only. Plain text crosses the wire; each party tokenizes independently. No embeddings, no shared latents, no vendor lock.
Tokenizer-audited lexicon. A symbol enters the vocabulary only if it costs 1 token, worst case, in every certified tokenizer (currently OpenAI o200k_base + Anthropic). Audit scripts included; claims are reproducible.
Compression from structure, not glyphs. Fixed field grammar, controlled vocabulary, in-band symbol table for repeated referents. Empirical finding: common English words are already optimal tokens; exotic Unicode usually is not.
Self-repairing. ?? misparse signal and a plain-English escape hatch are mandatory.

Quick taste

A measured example is pending. The previous illustrative example was removed on 2026-06-18: token-counting on the certified tokenizers contradicted its compression claim (the Tokenese form was larger than the English, not smaller). A replacement will ship only with reproducible token counts on every certified tokenizer, measured against terse English rather than verbose prose. See the spec "Example exchange" section and the changelog for the finding.

Tools

Translator + scorer: tools/translator/ - base Tokenese->English translator (originally built for Turnfile) plus the deterministic TKAB per-pair scorer for the W1+L1 mini-pilot.
CLI: tokenese-check --pair fixture.json --pretty after pip install -e tools/translator.
MCP server: python -m tokenese_translator.mcp_server exposes parse / validate / validate_framesets / to-english / check-pair / score-pair tools.
Frameset registry: framesets.json - report-only typed slot signatures for common ops. The registry feeds structural telemetry without changing parser acceptance or checker outcomes.
Conformance: the checker reports mismatches (R5.3); it never generates or repairs. See CONFORMANCE.md.

Reproduce the audit

python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
.venv/bin/python audit_symbols.py
ANTHROPIC_API_KEY=... .venv/bin/python audit_anthropic.py

Status

Grammar v0.3 current. Release v0.3.9 is a patch tooling release: the seven-column tokenizer audit is complete (the Gemma column is now native Gemma 4 E4B, the on-device PAICE production generator). The base translator, golden fixtures, TKAB deterministic scorer, grammar-v0.3 features, MCP smoke tests, compression/hypothesis evals, N2 static package report, and report-only frameset registry pass 156/156 tests; the repo-root security and Gemma 4 audit-surface tests add 7 more (163 total). The deterministic N2 receiver static floor now passes the 0.75 threshold after retiring the stale S1 semantic-neighborhood operator form. A cross-surface portable skill ships at skills/tokenese/. The hosted assistant guide is anchored at GuideCheck Level 4 via a DNS TXT record at _assistant-guide.tokenese.org, with a daily drift-detection CI job. The validating A/B experiment between Claude and Codex remains the open downstream measurement; see tools/translator/tkab/AUDIT_CARD.md.

Contributing

Contributions welcome; every change passes the admission criteria in INTENT.md. See CONTRIBUTING.md. The short version: claims must be measured, not asserted.

License

Code: MIT. Specification text: CC BY 4.0. See LICENSE and LICENSE-SPEC.

For agents

Coding agents should read AGENTS.md first.

Last updated: 2026-06-17

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
data/source_provenance		data/source_provenance
docs		docs
skills/tokenese		skills/tokenese
tools		tools
working-session		working-session
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONFORMANCE.md		CONFORMANCE.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
GRAMMAR-v0.3.md		GRAMMAR-v0.3.md
HANDOFF.md		HANDOFF.md
INTENT.md		INTENT.md
LICENSE		LICENSE
LICENSE-SPEC		LICENSE-SPEC
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
anthropic_costs.json		anthropic_costs.json
assistant-guide.txt		assistant-guide.txt
audit_anthropic.py		audit_anthropic.py
audit_candidates.py		audit_candidates.py
audit_check_intersection.py		audit_check_intersection.py
audit_common.py		audit_common.py
audit_deepseek.py		audit_deepseek.py
audit_gemini.py		audit_gemini.py
audit_gemma4.py		audit_gemma4.py
audit_llama.py		audit_llama.py
audit_qwen.py		audit_qwen.py
audit_symbols.py		audit_symbols.py
check_dns_anchor.py		check_dns_anchor.py
deepseek_costs.json		deepseek_costs.json
framesets.json		framesets.json
gemma4_costs.json		gemma4_costs.json
llama_costs.json		llama_costs.json
ontology.json		ontology.json
qwen_costs.json		qwen_costs.json
relationships.yaml		relationships.yaml
requirements-optional.txt		requirements-optional.txt
requirements.txt		requirements.txt
spec.md		spec.md
test_audit_anthropic.py		test_audit_anthropic.py
test_audit_gemma4.py		test_audit_gemma4.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenese

Canonical URL

What problem it solves

Who this is for

Why

How it works

Quick taste

Tools

Reproduce the audit

Status

Contributing

License

For agents

About

Licenses found

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tokenese

Canonical URL

What problem it solves

Who this is for

Why

How it works

Quick taste

Tools

Reproduce the audit

Status

Contributing

License

For agents

About

Topics

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages