-
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization,
arXiv, 2504.02407
, arxiv, pdf, cication: -1Xiaohui Sun, Ruitong Xiao, Jianye Mo, ..., Qun Yu, Baoxun Wang · (frontierlabs.github)
-
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens,
arXiv, 2503.01710
, arxiv, pdf, cication: -1Xinsheng Wang, Mingqi Jiang, Ziyang Ma, ..., Yike Guo, Wei Xue · (Spark-TTS - SparkAudio)
-
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System,
arXiv, 2502.05512
, arxiv, pdf, cication: -1Wei Deng, Siyi Zhou, Jingchen Shu, ..., Jinchao Wang, Lu Wang · (index-tts - index-tts)
-
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
-
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis,
arXiv, 2502.04128
, arxiv, pdf, cication: -1Zhen Ye, Xinfa Zhu, Chi-Min Chan, ..., Yike Guo, Wei Xue · (LLaSA_training - zhenye234)
-
🌟 CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models,
arXiv, 2412.10117
, arxiv, pdf, cication: -1Zhihao Du, Yuxuan Wang, Qian Chen, ..., Zhijie Yan, Jingren Zhou · (funaudiollm.github)
-
Autoregressive Speech Synthesis with Next-Distribution Prediction,
arXiv, 2412.16846
, arxiv, pdf, cication: -1Xinfa Zhu, Wenjie Tian, Lei Xie · (zxf-icpc.github)
-
🌟 Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis,
arXiv, 2411.01156
, arxiv, pdf, cication: -1Shijia Liao, Yuxuan Wang, Tianyu Li, ..., Rongzhi Zhou, Yijin Xing · (fish-speech - fishaudio)
· (𝕏)
-
🌟 Continuous Speech Synthesis using per-token Latent Diffusion,
arXiv, 2410.16048
, arxiv, pdf, cication: -1Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, ..., Ron Hoory, Avihu Dekel · (s3.us-south.objectstorage.softlayer)
-
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
-
Continuous Speech Synthesis using per-token Latent Diffusion,
arXiv, 2410.16048
, arxiv, pdf, cication: -1Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, ..., Ron Hoory, Avihu Dekel
-
Parakeet A natural sounding, conversational text-to-speech model
-
csm-voice-cloning - isaiahbjork
-
Spark-TTS - SparkAudio
An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens · (sparkaudio.github)
-
F5-TTS - lpscr
A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
- a new circular eye tracking device that uses ElevenLabs voices so people with ALS can generate speech with eye movement alone. 𝕏
- Generate a unique voice from a text prompt alone. 𝕏
-
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation,
arXiv, 2501.15907
, arxiv, pdf, cication: -1Haorui He, Zengqiang Shang, Chaoren Wang, ..., Pengyuan Zhang, Zhizheng Wu · (huggingface)