Zero Shot TTS

Zero Shot TTS
- Survey
- Zero Shot TTS
- Projects
- Products
- Datasets
- Toolkits
- Misc

Survey

Zero Shot TTS

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization, arXiv, 2504.02407, arxiv, pdf, cication: -1

Xiaohui Sun, Ruitong Xiao, Jianye Mo, ..., Qun Yu, Baoxun Wang · (frontierlabs.github)
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens, arXiv, 2503.01710, arxiv, pdf, cication: -1

Xinsheng Wang, Mingqi Jiang, Ziyang Ma, ..., Yike Guo, Wei Xue · (Spark-TTS - SparkAudio)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System, arXiv, 2502.05512, arxiv, pdf, cication: -1

Wei Deng, Siyi Zhou, Jingchen Shu, ..., Jinchao Wang, Lu Wang · (index-tts - index-tts)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis, arXiv, 2502.04128, arxiv, pdf, cication: -1

Zhen Ye, Xinfa Zhu, Chi-Min Chan, ..., Yike Guo, Wei Xue · (LLaSA_training - zhenye234)
🌟 Zonos-v0.1 is a leading open-weight text-to-speech model, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. 🤗
🌟 CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models, arXiv, 2412.10117, arxiv, pdf, cication: -1

Zhihao Du, Yuxuan Wang, Qian Chen, ..., Zhijie Yan, Jingren Zhou · (funaudiollm.github)
Autoregressive Speech Synthesis with Next-Distribution Prediction, arXiv, 2412.16846, arxiv, pdf, cication: -1

Xinfa Zhu, Wenjie Tian, Lei Xie · (zxf-icpc.github)
🌟 Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis, arXiv, 2411.01156, arxiv, pdf, cication: -1

Shijia Liao, Yuxuan Wang, Tianyu Li, ..., Rongzhi Zhou, Yijin Xing · (fish-speech - fishaudio) · (𝕏)
🌟 Continuous Speech Synthesis using per-token Latent Diffusion, arXiv, 2410.16048, arxiv, pdf, cication: -1

Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, ..., Ron Hoory, Avihu Dekel · (s3.us-south.objectstorage.softlayer)
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Continuous Speech Synthesis using per-token Latent Diffusion, arXiv, 2410.16048, arxiv, pdf, cication: -1

Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, ..., Ron Hoory, Avihu Dekel
Parakeet A natural sounding, conversational text-to-speech model

Projects

csm-voice-cloning - isaiahbjork
Spark-TTS - SparkAudio

An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens · (sparkaudio.github)
F5-TTS - lpscr

A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Products

a new circular eye tracking device that uses ElevenLabs voices so people with ALS can generate speech with eye movement alone. 𝕏
Generate a unique voice from a text prompt alone. 𝕏

Datasets

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation, arXiv, 2501.15907, arxiv, pdf, cication: -1

Haorui He, Zengqiang Shang, Chaoren Wang, ..., Pengyuan Zhang, Zhizheng Wu · (huggingface)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zs_tts.md

zs_tts.md

Zero Shot TTS

Survey

Zero Shot TTS

Projects

Products

Datasets

Toolkits

Misc

Files

zs_tts.md

Latest commit

History

zs_tts.md

File metadata and controls

Zero Shot TTS

Survey

Zero Shot TTS

Projects

Products

Datasets

Toolkits

Misc