- awesome-voice-conversion - JeffC0628
-
Fast Text-to-Audio Generation with Adversarial Post-Training,
arXiv, 2505.08175, arxiv, pdf, cication: -1Zachary Novack, Zach Evans, Zack Zukowski, ..., Taylor Berg-Kirkpatrick, Jordi Pons · (huggingface) · (arc-text2audio.github)
-
AudioX: Diffusion Transformer for Anything-to-Audio Generation,
arXiv, 2503.10522, arxiv, pdf, cication: -1Zeyue Tian, Yizhu Jin, Zhaoyang Liu, ..., Wei Xue, Yike Guo · (zeyuet.github)
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization,
arXiv, 2412.21037, arxiv, pdf, cication: -1Chia-Yu Hung, Navonil Majumder, Zhifeng Kong, ..., Bryan Catanzaro, Soujanya Poria · (tangoflux.github)
-
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks,
icassp 2024-2024 ieee international conference on acoustics …, 2024, arxiv, pdf, cication: -1Soumi Maiti, Yifan Peng, Shukjae Choi, ..., Xuankai Chang, Shinji Watanabe
-
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis,
arXiv, 2412.15322, arxiv, pdf, cication: -1Ho Kei Cheng, Masato Ishii, Akio Hayakawa, ..., Alexander Schwing, Yuki Mitsufuji · (huggingface) · (hkchengrex) · (MMAudio - hkchengrex)
-
· (fugatto.github)
-
Tell What You Hear From What You See -- Video to Audio Generation Through Text,
arXiv, 2411.05679, arxiv, pdf, cication: -1Xiulong Liu, Kun Su, Eli Shlizerman
-
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation,
arXiv, 2411.05141, arxiv, pdf, cication: -1Mu Yang, Bowen Shi, Matthew Le, ..., Wei-Ning Hsu, Andros Tjandra
-
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation,
arXiv, 2410.12266, arxiv, pdf, cication: -1Huadai Liu, Jialei Wang, Rongjie Huang, ..., Wei Xue, Zhou Zhao
-
Movie Gen: A Cast of Media Foundation Models,
arXiv, 2410.13720, arxiv, pdf, cication: -1Adam Polyak, Amit Zohar, Andrew Brown, ..., Vladan Petrovic, Yuming Du · (ai.meta)
-
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators,
arXiv, 2505.09558, arxiv, pdf, cication: -1Shengpeng Ji, Tianle Liang, Yangzhuo Li, ..., Junyang Lin, Zhou Zhao
-
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
-
Recent Advances in Discrete Speech Tokens: A Review,
arXiv, 2502.06490, arxiv, pdf, cication: -1Yiwei Guo, Zhihan Li, Hankun Wang, ..., Shujie Liu, Kai Yu
-
Introducing hertz-dev, the first open-source base model for conversational audio generation