I am a Foundation Model Algorithm Engineer at ModelBest (้ขๅฃๆบ่ฝ๏ผVoxCPM). I received my Master's degree from the School of Artificial Intelligence, Beijing University of Posts and Telecommunications (BUPT). My research interests lie in the intersection of Large Speech Models (LSM), Automatic Speech Recognition (ASR), Singing Voice Conversion (SVC), and Expressive Text-to-Speech (TTS).
Prior to joining ModelBest, I earned my Bachelor's degree in Computer Science and Technology from Ningbo University (Yangming Innovation Class). I have gained extensive industry experience through research and engineering internships at Zhipu AI (ๆบ่ฐฑAI่ฏญ้ณ่พๅ ฅๆณ), Tencent Music Entertainment (TME, Lyra Lab / ๅคฉ็ดๅฎ้ชๅฎค), and Momo (้้).
I have been awarded the Zhejiang Government Scholarship (3 times) and the BUPT First-Class Scholarship (2 times). My research has been accepted for top-tier conferences such as AAAI, Interspeech, ICASSP, and ISCSLP.
- 2026.03: ๐ Joined ModelBest as a Large Speech Foundation Model Researcher.
- 2026.01: ๐ One paper (SynParaSpeech) accepted by ICASSP 2026 as the first author!
- 2025.12: ๐ One paper (HQ-SVC) accepted by AAAI 2026 as the first author!
- 2025.10: ๐ Joined Zhipu AI as a Speech Large Model Research Intern.
- 2025.07: ๐ธ Joined Tencent Music (QQ Music) focusing on multi-speaker conversational podcast TTS.
- 2025.03: ๐ซ Joined Momo focusing on paralinguistic TTS and understanding.
- 2024.06: ๐ One paper (SPA-SVC) accepted by Interspeech 2024 as the first author.
๐ For a full list of publications, please visit my Google Scholar.
HQ-SVC: High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios, Bingsong Bai, et al., AAAI 2026. [CCF-A]
SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding, Bingsong Bai, et al., ICASSP 2026. [CCF-B]
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion, Bingsong Bai, et al., Interspeech 2024. [CCF-B]
ExpressiveSinger: Synthesizing Expressive Singing Voice as an Instrument, Fengping Wang, Bingsong Bai, et al., ISCSLP 2024.
- GLM-ASR Nano: Participated in training and SFT of the SOTA open-source ASR model, reaching #1 on Hugging Face speech model download charts (440k+ downloads in 2 weeks).
- Multi-Speaker Conversational TTS: Improving rhythm/pauses by 68.49% in AI Podcasts (Internal Project @ Tencent Music). Participated in QinYu-TTS
- 2023, 2024: BUPT First-Class Academic Scholarship
- 2020, 2021, 2022: Zhejiang Provincial Government Scholarship (3 consecutive years)
- 2021: Mathematical Contest in Modeling (MCM) - International Second Prize
- 2020: Contemporary Undergraduate Mathematical Contest in Modeling (CUMCM) - Provincial Second Prize



