Skip to content

Releases: aivrar/portable-tts-server

Fix venv creation with embedded Python

18 Feb 04:23

Choose a tag to compare

Bug Fixes

  • Fix venv creation failing with embedded Python — The GUI's environment installer used python -m venv which doesn't work with Python's embeddable distribution (it lacks the venv module). Switched to virtualenv (pip-installed) which works correctly. This was the root cause of all environment installation failures on fresh installs.
  • Fix operator precedence bug in git clone installerrun_git_clone() could incorrectly trigger editable installs even when editable=False if a pyproject.toml was present, due to and/or precedence.
  • Added virtualenv to requirements.txt — Automatically installed during bootstrap so it's available when the GUI creates venvs.

Who's Affected

Anyone running install.bat for the first time without system Python on PATH. Existing installs with working venvs are unaffected.

Files Changed

  • tts_manager.pypython -m venvpython -m virtualenv
  • install_configs/base.py — Fixed boolean logic in run_git_clone()
  • requirements.txt — Added virtualenv>=20.25.0

v3.0.1 — Fix install.bat on fresh installs

11 Feb 18:59

Choose a tag to compare

Bug Fix

Fixed install.bat failing on fresh installs due to PowerShell multi-line command parsing.

Root cause: The ^ (caret) line continuation characters were placed inside double-quoted strings. CMD treats ^ inside quotes as a literal character rather than a line continuation, so it was passed directly to PowerShell which doesn't recognize ^ as a valid token.

This affected all 8 PowerShell download/configure blocks (Python, ._pth config, get-pip.py, tkinter, Git, FFmpeg, Rubberband, eSpeak NG). Existing installs were unaffected because the script skips steps when files already exist.

Fix: Close the double quotes before each ^ so CMD correctly treats them as line continuations.

v3.0.0 — Portable Multi-GPU TTS Server

08 Feb 07:15

Choose a tag to compare

Portable TTS Server v3.0.0

Portable multi-GPU text-to-speech server for Windows. 10 AI models, gateway + worker architecture, 7-stage audio pipeline, Whisper verification, one-click install. No system Python, no Docker, no admin rights.

Setup

1. Download or clone this repository
2. Double-click install.bat
3. Install model environments from the GUI (Environments tab)
4. Download model weights (Models tab)
5. Start the API server (API Server tab, or launcher.bat api)
6. Send requests to http://127.0.0.1:8100/api/tts/{model}

install.bat automatically downloads and configures: embedded Python 3.10, portable Git, FFmpeg, Rubberband, eSpeak NG, and all gateway dependencies. Nothing touches your system.

10 TTS Models

Model Params Key Capability
XTTS v2 ~500M Multilingual voice cloning, 58 built-in speakers
Bark ~1B Expressive audio — laughter, music, nonverbals, 260+ presets
Fish Speech 1.5 ~500M Fast neural TTS with V1.5 FireflyGAN codec
Kokoro 82M Lightweight and fast, 54 built-in voices
Dia 1.6B-0626 1.6B Multi-speaker dialogue with [S1]/[S2] tags
Chatterbox ~500M Emotion and exaggeration control
F5-TTS ~300M Diffusion-based, reference audio + transcript cloning
Qwen2.5-Omni 7B Multimodal LLM with speech output
VibeVoice 1.5B Speaker-conditioned, multi-speaker via Speaker N: format
Higgs Audio 3B Automatic prosody, ChatML format, runs on CPU

Architecture

  • Gateway (port 8100) — orchestrates pipeline, delegates inference to workers via HTTP
  • Workers (ports 8101-8200) — each runs one model on one GPU as an isolated subprocess
  • Each worker injects only its venv's site-packages — zero cross-environment conflicts
  • Same model can run multiple instances across GPUs for concurrent inference
  • Workers auto-spawn on first request, fail over to siblings, health-checked every 10s

Audio Pipeline

7-stage post-processing applied per chunk: de-reverb (noisereduce), highpass filter (scipy), de-esser (scipy), tempo adjustment (pyrubberband), silence trimming (pydub), LUFS normalization (pyloudnorm), peak limiting. Each stage degrades gracefully if its library is unavailable.

Key Features

  • Multi-GPU — Pin workers to any NVIDIA GPU, run the same model on multiple GPUs simultaneously
  • 40+ REST API endpoints — TTS inference, worker management, job tracking, Whisper verification, GPU discovery
  • Whisper verification — Transcribe generated audio and score against original text
  • Job management — Progress tracking, cancellation, recovery from interruptions
  • Format export — WAV, MP3, OGG, FLAC, M4A via FFmpeg
  • GUI environment manager — Install venvs, download models, manage workers, test TTS from a Tkinter interface
  • Interactive API docs — Swagger UI at /docs when the server is running

Requirements

  • Windows 10/11 (64-bit)
  • NVIDIA GPU with CUDA (any VRAM — Kokoro runs in ~1 GB, Higgs runs on CPU)
  • Internet connection (first run only, for downloading dependencies and models)
  • No admin rights needed