Releases · aivrar/portable-tts-server

18 Feb 04:23

aivrar

v3.0.2

708539c

Fix venv creation with embedded Python Latest

Latest

Bug Fixes

Fix venv creation failing with embedded Python — The GUI's environment installer used python -m venv which doesn't work with Python's embeddable distribution (it lacks the venv module). Switched to virtualenv (pip-installed) which works correctly. This was the root cause of all environment installation failures on fresh installs.
Fix operator precedence bug in git clone installer — run_git_clone() could incorrectly trigger editable installs even when editable=False if a pyproject.toml was present, due to and/or precedence.
Added virtualenv to requirements.txt — Automatically installed during bootstrap so it's available when the GUI creates venvs.

Who's Affected

Anyone running install.bat for the first time without system Python on PATH. Existing installs with working venvs are unaffected.

Files Changed

tts_manager.py — python -m venv → python -m virtualenv
install_configs/base.py — Fixed boolean logic in run_git_clone()
requirements.txt — Added virtualenv>=20.25.0

Assets 3

11 Feb 18:59

aivrar

v3.0.1

b0410de

v3.0.1 — Fix install.bat on fresh installs

Bug Fix

Fixed install.bat failing on fresh installs due to PowerShell multi-line command parsing.

Root cause: The ^ (caret) line continuation characters were placed inside double-quoted strings. CMD treats ^ inside quotes as a literal character rather than a line continuation, so it was passed directly to PowerShell which doesn't recognize ^ as a valid token.

This affected all 8 PowerShell download/configure blocks (Python, ._pth config, get-pip.py, tkinter, Git, FFmpeg, Rubberband, eSpeak NG). Existing installs were unaffected because the script skips steps when files already exist.

Fix: Close the double quotes before each ^ so CMD correctly treats them as line continuations.

Assets 2

08 Feb 07:15

aivrar

v3.0.0

eb9a994

v3.0.0 — Portable Multi-GPU TTS Server

Portable TTS Server v3.0.0

Portable multi-GPU text-to-speech server for Windows. 10 AI models, gateway + worker architecture, 7-stage audio pipeline, Whisper verification, one-click install. No system Python, no Docker, no admin rights.

Setup

1. Download or clone this repository
2. Double-click install.bat
3. Install model environments from the GUI (Environments tab)
4. Download model weights (Models tab)
5. Start the API server (API Server tab, or launcher.bat api)
6. Send requests to http://127.0.0.1:8100/api/tts/{model}

install.bat automatically downloads and configures: embedded Python 3.10, portable Git, FFmpeg, Rubberband, eSpeak NG, and all gateway dependencies. Nothing touches your system.

10 TTS Models

Model	Params	Key Capability
XTTS v2	~500M	Multilingual voice cloning, 58 built-in speakers
Bark	~1B	Expressive audio — laughter, music, nonverbals, 260+ presets
Fish Speech 1.5	~500M	Fast neural TTS with V1.5 FireflyGAN codec
Kokoro	82M	Lightweight and fast, 54 built-in voices
Dia 1.6B-0626	1.6B	Multi-speaker dialogue with [S1]/[S2] tags
Chatterbox	~500M	Emotion and exaggeration control
F5-TTS	~300M	Diffusion-based, reference audio + transcript cloning
Qwen2.5-Omni	7B	Multimodal LLM with speech output
VibeVoice	1.5B	Speaker-conditioned, multi-speaker via Speaker N: format
Higgs Audio	3B	Automatic prosody, ChatML format, runs on CPU

Architecture

Gateway (port 8100) — orchestrates pipeline, delegates inference to workers via HTTP
Workers (ports 8101-8200) — each runs one model on one GPU as an isolated subprocess
Each worker injects only its venv's site-packages — zero cross-environment conflicts
Same model can run multiple instances across GPUs for concurrent inference
Workers auto-spawn on first request, fail over to siblings, health-checked every 10s

Audio Pipeline

7-stage post-processing applied per chunk: de-reverb (noisereduce), highpass filter (scipy), de-esser (scipy), tempo adjustment (pyrubberband), silence trimming (pydub), LUFS normalization (pyloudnorm), peak limiting. Each stage degrades gracefully if its library is unavailable.

Key Features

Multi-GPU — Pin workers to any NVIDIA GPU, run the same model on multiple GPUs simultaneously
40+ REST API endpoints — TTS inference, worker management, job tracking, Whisper verification, GPU discovery
Whisper verification — Transcribe generated audio and score against original text
Job management — Progress tracking, cancellation, recovery from interruptions
Format export — WAV, MP3, OGG, FLAC, M4A via FFmpeg
GUI environment manager — Install venvs, download models, manage workers, test TTS from a Tkinter interface
Interactive API docs — Swagger UI at /docs when the server is running

Requirements

Windows 10/11 (64-bit)
NVIDIA GPU with CUDA (any VRAM — Kokoro runs in ~1 GB, Higgs runs on CPU)
Internet connection (first run only, for downloading dependencies and models)
No admin rights needed

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Bug Fixes

Who's Affected

Files Changed

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Bug Fix

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Portable TTS Server v3.0.0

Setup

10 TTS Models

Architecture

Audio Pipeline

Key Features

Requirements

Uh oh!

Releases: aivrar/portable-tts-server

Fix venv creation with embedded Python

Bug Fixes

Who's Affected

Files Changed

Uh oh!

v3.0.1 — Fix install.bat on fresh installs

Bug Fix

Uh oh!

v3.0.0 — Portable Multi-GPU TTS Server

Portable TTS Server v3.0.0

Setup

10 TTS Models

Architecture

Audio Pipeline

Key Features

Requirements

Uh oh!