First production release. Vui shifts from a standalone TTS model to a full streaming conversational voice assistant.
Added
- Vui Nano (300M) — new flagship model. Llama-style decoder + RQ-Transformer head over the Qwen3-TTS-12Hz codec. bf16 inference, CUDA graphs, ~9× realtime streaming on a 4090.
- Streaming server (
python -m vui.serving.stream) — WebRTC + WebSocket pipeline (ASR → LLM → TTS) with browser UI, VAD-driven turn-taking, speculative LLM prefill, sentence-level TTS chunking with backpressure, and barge-in. - OpenAI Realtime API compatibility — drop-in
ws://…/v1/realtimewith the standard event surface (session.update,input_audio_buffer.append,response.create,response.audio.delta, …) and PCM16 @ 24 kHz. POST /v1/voice-note— synchronous REST endpoint that runs the full ASR → LLM → TTS pipeline in a single HTTP call.- Voice cloning + fine-tuned presets —
maeve,abraham,rhian,harryshipped inprompts/; arbitrary speakers cloneable from a.wavsample. - SQ / WPS conditioning — six speech-quality channels and words-per-second.
- Pluggable ASR — faster-whisper (GPU, default) and Moonshine (CPU, ONNX), switchable live from the UI.
- Pluggable LLM backends — Ollama, vLLM, any OpenAI-compatible endpoint.
- Memories — assistant remembers facts across sessions, persisted to
~/.vui/memories.json. - Thoughts stream — parallel LLM that routes voice intent to ~10 tools without a wake-word grammar; pluggable for user-defined local tools.
- Claude task server (optional sidecar) — handles slow/agentic work via the host's Claude Code MCPs. Speaks Anthropic's
/v1/messages; backable by Ollama / z.ai / DeepSeek / vLLM / LM Studio / LiteLLM. - Apple Silicon (MLX) backend — auto-detected; auto-creates
qwen3.5-4b-mlxon first run. Marked WIP. - Mobile support — documented cloudflared and Tailscale paths for phone access.
- Docker compose — one-file stack (streaming server + optional bundled Ollama + optional Claude task server).
- One-liner installer —
curl -fsSL https://install.fluxions.ai | bash. - Standalone TTS demo (
demo.py) — Gradio playground with SQ/WPS sliders + CLI render mode. - Telemetry — anonymous
{voice, seconds}events per render; disable withVUI_TELEMETRY=0. - Documentation —
docs/configuration.md,docs/realtime-api.md,docs/claude-task-server.md,docs/thoughts-tools.md,docs/soul.md,docs/memory-budget.md,docs/mobile.md,docs/releasing.md.
Changed
- Audio codec: Fluac (modified DAC with FSQ, ~21.5 Hz) replaced by Qwen3-TTS-Tokenizer-12Hz (16 codebooks of 2048 entries at 12.5 Hz, 24 kHz decoded).
- Speaker encoder: ECAPA-TDNN from
Qwen3-TTS-12Hz-0.6B-Base(8.9M params, 1024-dim). - Text tokenization: byT5 byte-level → tiktoken-based (
src/vui/tokenizer.py). - Python:
>=3.12,<3.13(was==3.12.3). - Dependencies: streaming/server stack pulled in (
aiohttp,aiortc,av,faster-whisper,onnxruntime,huggingface_hub,safetensors,claude-agent-sdk,flash-attn).
Removed
src/vui/fluac.py,patterns.py,tok.py,notebook.py,utils.py,vad.py,inference.py(replaced),inference.ipynb— superseded byengine.py/tokenizer.py/streaming.py/ serving stack.Vui.BASE,Vui.ABRAHAM,Vui.COHOSTcheckpoints — superseded by Vui Nano.
Full changelog: https://github.com/fluxions-ai/vui/blob/v1.0.0/CHANGELOG.md