Releases: aivrar/portable-tts-server
Fix venv creation with embedded Python
Bug Fixes
- Fix venv creation failing with embedded Python — The GUI's environment installer used
python -m venvwhich doesn't work with Python's embeddable distribution (it lacks thevenvmodule). Switched tovirtualenv(pip-installed) which works correctly. This was the root cause of all environment installation failures on fresh installs. - Fix operator precedence bug in git clone installer —
run_git_clone()could incorrectly trigger editable installs even wheneditable=Falseif apyproject.tomlwas present, due toand/orprecedence. - Added
virtualenvto requirements.txt — Automatically installed during bootstrap so it's available when the GUI creates venvs.
Who's Affected
Anyone running install.bat for the first time without system Python on PATH. Existing installs with working venvs are unaffected.
Files Changed
tts_manager.py—python -m venv→python -m virtualenvinstall_configs/base.py— Fixed boolean logic inrun_git_clone()requirements.txt— Addedvirtualenv>=20.25.0
v3.0.1 — Fix install.bat on fresh installs
Bug Fix
Fixed install.bat failing on fresh installs due to PowerShell multi-line command parsing.
Root cause: The ^ (caret) line continuation characters were placed inside double-quoted strings. CMD treats ^ inside quotes as a literal character rather than a line continuation, so it was passed directly to PowerShell which doesn't recognize ^ as a valid token.
This affected all 8 PowerShell download/configure blocks (Python, ._pth config, get-pip.py, tkinter, Git, FFmpeg, Rubberband, eSpeak NG). Existing installs were unaffected because the script skips steps when files already exist.
Fix: Close the double quotes before each ^ so CMD correctly treats them as line continuations.
v3.0.0 — Portable Multi-GPU TTS Server
Portable TTS Server v3.0.0
Portable multi-GPU text-to-speech server for Windows. 10 AI models, gateway + worker architecture, 7-stage audio pipeline, Whisper verification, one-click install. No system Python, no Docker, no admin rights.
Setup
1. Download or clone this repository
2. Double-click install.bat
3. Install model environments from the GUI (Environments tab)
4. Download model weights (Models tab)
5. Start the API server (API Server tab, or launcher.bat api)
6. Send requests to http://127.0.0.1:8100/api/tts/{model}
install.bat automatically downloads and configures: embedded Python 3.10, portable Git, FFmpeg, Rubberband, eSpeak NG, and all gateway dependencies. Nothing touches your system.
10 TTS Models
| Model | Params | Key Capability |
|---|---|---|
| XTTS v2 | ~500M | Multilingual voice cloning, 58 built-in speakers |
| Bark | ~1B | Expressive audio — laughter, music, nonverbals, 260+ presets |
| Fish Speech 1.5 | ~500M | Fast neural TTS with V1.5 FireflyGAN codec |
| Kokoro | 82M | Lightweight and fast, 54 built-in voices |
| Dia 1.6B-0626 | 1.6B | Multi-speaker dialogue with [S1]/[S2] tags |
| Chatterbox | ~500M | Emotion and exaggeration control |
| F5-TTS | ~300M | Diffusion-based, reference audio + transcript cloning |
| Qwen2.5-Omni | 7B | Multimodal LLM with speech output |
| VibeVoice | 1.5B | Speaker-conditioned, multi-speaker via Speaker N: format |
| Higgs Audio | 3B | Automatic prosody, ChatML format, runs on CPU |
Architecture
- Gateway (port 8100) — orchestrates pipeline, delegates inference to workers via HTTP
- Workers (ports 8101-8200) — each runs one model on one GPU as an isolated subprocess
- Each worker injects only its venv's site-packages — zero cross-environment conflicts
- Same model can run multiple instances across GPUs for concurrent inference
- Workers auto-spawn on first request, fail over to siblings, health-checked every 10s
Audio Pipeline
7-stage post-processing applied per chunk: de-reverb (noisereduce), highpass filter (scipy), de-esser (scipy), tempo adjustment (pyrubberband), silence trimming (pydub), LUFS normalization (pyloudnorm), peak limiting. Each stage degrades gracefully if its library is unavailable.
Key Features
- Multi-GPU — Pin workers to any NVIDIA GPU, run the same model on multiple GPUs simultaneously
- 40+ REST API endpoints — TTS inference, worker management, job tracking, Whisper verification, GPU discovery
- Whisper verification — Transcribe generated audio and score against original text
- Job management — Progress tracking, cancellation, recovery from interruptions
- Format export — WAV, MP3, OGG, FLAC, M4A via FFmpeg
- GUI environment manager — Install venvs, download models, manage workers, test TTS from a Tkinter interface
- Interactive API docs — Swagger UI at /docs when the server is running
Requirements
- Windows 10/11 (64-bit)
- NVIDIA GPU with CUDA (any VRAM — Kokoro runs in ~1 GB, Higgs runs on CPU)
- Internet connection (first run only, for downloading dependencies and models)
- No admin rights needed