Skip to content

v1.0.0 — Vui Nano + streaming voice assistant

Latest

Choose a tag to compare

@mogwai mogwai released this 14 May 14:29
· 27 commits to main since this release

First production release. Vui shifts from a standalone TTS model to a full streaming conversational voice assistant.

Added

  • Vui Nano (300M) — new flagship model. Llama-style decoder + RQ-Transformer head over the Qwen3-TTS-12Hz codec. bf16 inference, CUDA graphs, ~9× realtime streaming on a 4090.
  • Streaming server (python -m vui.serving.stream) — WebRTC + WebSocket pipeline (ASR → LLM → TTS) with browser UI, VAD-driven turn-taking, speculative LLM prefill, sentence-level TTS chunking with backpressure, and barge-in.
  • OpenAI Realtime API compatibility — drop-in ws://…/v1/realtime with the standard event surface (session.update, input_audio_buffer.append, response.create, response.audio.delta, …) and PCM16 @ 24 kHz.
  • POST /v1/voice-note — synchronous REST endpoint that runs the full ASR → LLM → TTS pipeline in a single HTTP call.
  • Voice cloning + fine-tuned presetsmaeve, abraham, rhian, harry shipped in prompts/; arbitrary speakers cloneable from a .wav sample.
  • SQ / WPS conditioning — six speech-quality channels and words-per-second.
  • Pluggable ASR — faster-whisper (GPU, default) and Moonshine (CPU, ONNX), switchable live from the UI.
  • Pluggable LLM backends — Ollama, vLLM, any OpenAI-compatible endpoint.
  • Memories — assistant remembers facts across sessions, persisted to ~/.vui/memories.json.
  • Thoughts stream — parallel LLM that routes voice intent to ~10 tools without a wake-word grammar; pluggable for user-defined local tools.
  • Claude task server (optional sidecar) — handles slow/agentic work via the host's Claude Code MCPs. Speaks Anthropic's /v1/messages; backable by Ollama / z.ai / DeepSeek / vLLM / LM Studio / LiteLLM.
  • Apple Silicon (MLX) backend — auto-detected; auto-creates qwen3.5-4b-mlx on first run. Marked WIP.
  • Mobile support — documented cloudflared and Tailscale paths for phone access.
  • Docker compose — one-file stack (streaming server + optional bundled Ollama + optional Claude task server).
  • One-liner installercurl -fsSL https://install.fluxions.ai | bash.
  • Standalone TTS demo (demo.py) — Gradio playground with SQ/WPS sliders + CLI render mode.
  • Telemetry — anonymous {voice, seconds} events per render; disable with VUI_TELEMETRY=0.
  • Documentationdocs/configuration.md, docs/realtime-api.md, docs/claude-task-server.md, docs/thoughts-tools.md, docs/soul.md, docs/memory-budget.md, docs/mobile.md, docs/releasing.md.

Changed

  • Audio codec: Fluac (modified DAC with FSQ, ~21.5 Hz) replaced by Qwen3-TTS-Tokenizer-12Hz (16 codebooks of 2048 entries at 12.5 Hz, 24 kHz decoded).
  • Speaker encoder: ECAPA-TDNN from Qwen3-TTS-12Hz-0.6B-Base (8.9M params, 1024-dim).
  • Text tokenization: byT5 byte-level → tiktoken-based (src/vui/tokenizer.py).
  • Python: >=3.12,<3.13 (was ==3.12.3).
  • Dependencies: streaming/server stack pulled in (aiohttp, aiortc, av, faster-whisper, onnxruntime, huggingface_hub, safetensors, claude-agent-sdk, flash-attn).

Removed

  • src/vui/fluac.py, patterns.py, tok.py, notebook.py, utils.py, vad.py, inference.py (replaced), inference.ipynb — superseded by engine.py / tokenizer.py / streaming.py / serving stack.
  • Vui.BASE, Vui.ABRAHAM, Vui.COHOST checkpoints — superseded by Vui Nano.

Full changelog: https://github.com/fluxions-ai/vui/blob/v1.0.0/CHANGELOG.md