Hi — we came across Off Grid via a cold email from Saiganesh and immediately recognized the overlap with our work.
Who we are: RFI-IRFOS, a small research team building the Ternary Intelligence Stack — a full-layer research and inference platform built on ternary computation:
- albert. — ternary MoE language model, trained from scratch
- @sparseskip — patent-pending sparse inference (skips zero-weight expert activations at runtime; 83 tok/s on modest hardware)
- ternlang — ternary programming language and runtime
- TernStudio — IDE built for ternary-native development
- MCP infrastructure — live endpoint on Smithery + Fly.io, auth, KPI pipeline
All MIT-licensed. github.com/rfi-irfos | ternlang.com
Three concrete angles we'd like to explore:
1. @sparseskip — sparse inference for your pipeline
We have a patent-pending technique that skips zero-weight expert activations at inference time. Ternary weights ({-1, 0, +1}) have a very high zero-weight rate by design, so the gains are especially large on ternary models. We're hitting 83 tok/s on modest hardware in our benchmarks. On mobile CPUs where every cycle matters, this could meaningfully improve your tok/s numbers. Happy to discuss how it could fit into the llama.rn / llama.cpp layer.
2. albert. as a model in your browser
albert. will export to GGUF. A ternary MoE at 4–8GB would be the first model of its kind in a mobile app. The quality-per-size tradeoff is the whole point of ternary quantization — fits your 4GB device constraint story well.
3. MCP tooling — we have a head start
We noticed MCP server support is on your Pro roadmap. We have a live MCP endpoint (published on Smithery, running on Fly.io) and have been building that infrastructure for a while. If you're building the client side and we have the server side, this is a natural handoff.
We use Claude Code, move fast, and are genuinely excited about where this could go — a fully offline ternary LLM app with a complete tool ecosystem is not a small thing. Not proposing anything formal — just opening the conversation.
Hi — we came across Off Grid via a cold email from Saiganesh and immediately recognized the overlap with our work.
Who we are: RFI-IRFOS, a small research team building the Ternary Intelligence Stack — a full-layer research and inference platform built on ternary computation:
All MIT-licensed. github.com/rfi-irfos | ternlang.com
Three concrete angles we'd like to explore:
1. @sparseskip — sparse inference for your pipeline
We have a patent-pending technique that skips zero-weight expert activations at inference time. Ternary weights ({-1, 0, +1}) have a very high zero-weight rate by design, so the gains are especially large on ternary models. We're hitting 83 tok/s on modest hardware in our benchmarks. On mobile CPUs where every cycle matters, this could meaningfully improve your tok/s numbers. Happy to discuss how it could fit into the llama.rn / llama.cpp layer.
2. albert. as a model in your browser
albert. will export to GGUF. A ternary MoE at 4–8GB would be the first model of its kind in a mobile app. The quality-per-size tradeoff is the whole point of ternary quantization — fits your 4GB device constraint story well.
3. MCP tooling — we have a head start
We noticed MCP server support is on your Pro roadmap. We have a live MCP endpoint (published on Smithery, running on Fly.io) and have been building that infrastructure for a while. If you're building the client side and we have the server side, this is a natural handoff.
We use Claude Code, move fast, and are genuinely excited about where this could go — a fully offline ternary LLM app with a complete tool ecosystem is not a small thing. Not proposing anything formal — just opening the conversation.