Release v1.0.0 — Vui Nano + streaming voice assistant · fluxions-ai/vui

First production release. Vui shifts from a standalone TTS model to a full streaming conversational voice assistant.

Added

Vui Nano (300M) — new flagship model. Llama-style decoder + RQ-Transformer head over the Qwen3-TTS-12Hz codec. bf16 inference, CUDA graphs, ~9× realtime streaming on a 4090.
Streaming server (python -m vui.serving.stream) — WebRTC + WebSocket pipeline (ASR → LLM → TTS) with browser UI, VAD-driven turn-taking, speculative LLM prefill, sentence-level TTS chunking with backpressure, and barge-in.
OpenAI Realtime API compatibility — drop-in ws://…/v1/realtime with the standard event surface (session.update, input_audio_buffer.append, response.create, response.audio.delta, …) and PCM16 @ 24 kHz.
POST /v1/voice-note — synchronous REST endpoint that runs the full ASR → LLM → TTS pipeline in a single HTTP call.
Voice cloning + fine-tuned presets — maeve, abraham, rhian, harry shipped in prompts/; arbitrary speakers cloneable from a .wav sample.
SQ / WPS conditioning — six speech-quality channels and words-per-second.
Pluggable ASR — faster-whisper (GPU, default) and Moonshine (CPU, ONNX), switchable live from the UI.
Pluggable LLM backends — Ollama, vLLM, any OpenAI-compatible endpoint.
Memories — assistant remembers facts across sessions, persisted to ~/.vui/memories.json.
Thoughts stream — parallel LLM that routes voice intent to ~10 tools without a wake-word grammar; pluggable for user-defined local tools.
Claude task server (optional sidecar) — handles slow/agentic work via the host's Claude Code MCPs. Speaks Anthropic's /v1/messages; backable by Ollama / z.ai / DeepSeek / vLLM / LM Studio / LiteLLM.
Apple Silicon (MLX) backend — auto-detected; auto-creates qwen3.5-4b-mlx on first run. Marked WIP.
Mobile support — documented cloudflared and Tailscale paths for phone access.
Docker compose — one-file stack (streaming server + optional bundled Ollama + optional Claude task server).
One-liner installer — curl -fsSL https://install.fluxions.ai | bash.
Standalone TTS demo (demo.py) — Gradio playground with SQ/WPS sliders + CLI render mode.
Telemetry — anonymous {voice, seconds} events per render; disable with VUI_TELEMETRY=0.
Documentation — docs/configuration.md, docs/realtime-api.md, docs/claude-task-server.md, docs/thoughts-tools.md, docs/soul.md, docs/memory-budget.md, docs/mobile.md, docs/releasing.md.

Changed

Audio codec: Fluac (modified DAC with FSQ, ~21.5 Hz) replaced by Qwen3-TTS-Tokenizer-12Hz (16 codebooks of 2048 entries at 12.5 Hz, 24 kHz decoded).
Speaker encoder: ECAPA-TDNN from Qwen3-TTS-12Hz-0.6B-Base (8.9M params, 1024-dim).
Text tokenization: byT5 byte-level → tiktoken-based (src/vui/tokenizer.py).
Python: >=3.12,<3.13 (was ==3.12.3).
Dependencies: streaming/server stack pulled in (aiohttp, aiortc, av, faster-whisper, onnxruntime, huggingface_hub, safetensors, claude-agent-sdk, flash-attn).

Removed

src/vui/fluac.py, patterns.py, tok.py, notebook.py, utils.py, vad.py, inference.py (replaced), inference.ipynb — superseded by engine.py / tokenizer.py / streaming.py / serving stack.
Vui.BASE, Vui.ABRAHAM, Vui.COHOST checkpoints — superseded by Vui Nano.

Full changelog: https://github.com/fluxions-ai/vui/blob/v1.0.0/CHANGELOG.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0 — Vui Nano + streaming voice assistant

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Changed

Removed

Uh oh!