Skip to content

Collaboration proposal: @sparseskip inference optimization + ternary MoE model + MCP tooling #376

@simeon-kepp

Description

@simeon-kepp

Hi — we came across Off Grid via a cold email from Saiganesh and immediately recognized the overlap with our work.

Who we are: RFI-IRFOS, a small research team building the Ternary Intelligence Stack — a full-layer research and inference platform built on ternary computation:

  • albert. — ternary MoE language model, trained from scratch
  • @sparseskip — patent-pending sparse inference (skips zero-weight expert activations at runtime; 83 tok/s on modest hardware)
  • ternlang — ternary programming language and runtime
  • TernStudio — IDE built for ternary-native development
  • MCP infrastructure — live endpoint on Smithery + Fly.io, auth, KPI pipeline

All MIT-licensed. github.com/rfi-irfos | ternlang.com

Three concrete angles we'd like to explore:

1. @sparseskip — sparse inference for your pipeline

We have a patent-pending technique that skips zero-weight expert activations at inference time. Ternary weights ({-1, 0, +1}) have a very high zero-weight rate by design, so the gains are especially large on ternary models. We're hitting 83 tok/s on modest hardware in our benchmarks. On mobile CPUs where every cycle matters, this could meaningfully improve your tok/s numbers. Happy to discuss how it could fit into the llama.rn / llama.cpp layer.

2. albert. as a model in your browser

albert. will export to GGUF. A ternary MoE at 4–8GB would be the first model of its kind in a mobile app. The quality-per-size tradeoff is the whole point of ternary quantization — fits your 4GB device constraint story well.

3. MCP tooling — we have a head start

We noticed MCP server support is on your Pro roadmap. We have a live MCP endpoint (published on Smithery, running on Fly.io) and have been building that infrastructure for a while. If you're building the client side and we have the server side, this is a natural handoff.


We use Claude Code, move fast, and are genuinely excited about where this could go — a fully offline ternary LLM app with a complete tool ecosystem is not a small thing. Not proposing anything formal — just opening the conversation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions