Instant voice cloning by MIT and MyShell. Audio foundation model.
-
Updated
Apr 19, 2025 - Python
Instant voice cloning by MIT and MyShell. Audio foundation model.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Foundational model for human-like, expressive TTS
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
Automated voice dubbing for YouTube videos using Docker, OpenVoice, and FastAPI. Translates and dubs videos with original voice timbre.
Benchmark for voice cloning robustness, speaker privacy, and audio protection across 26 TTS and VC models.
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
Dockerized Voicecraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
🎙️ OmniVoice Thai API — Zero-shot Thai TTS with Web UI + REST API (Voice Cloning, Voice Design, Auto Voice)
Self-hosted zero-shot voice cloning: turn a 3-30s sample into a reusable voice profile and synthesize speech from any text.
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
[Interspeech 2026] DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching
Coach structured answers in real time during mock interviews with question detection, feedback, filler tracking, and live transcription.
Add a description, image, and links to the zero-shot-tts topic page so that developers can more easily learn about it.
To associate your repository with the zero-shot-tts topic, visit your repo's landing page and select "manage topics."