- awesome-voice-conversion - JeffC0628
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization,
arXiv, 2412.21037
, arxiv, pdf, cication: -1Chia-Yu Hung, Navonil Majumder, Zhifeng Kong, ..., Bryan Catanzaro, Soujanya Poria · (tangoflux.github)
-
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks,
icassp 2024-2024 ieee international conference on acoustics …, 2024
, arxiv, pdf, cication: -1Soumi Maiti, Yifan Peng, Shukjae Choi, ..., Xuankai Chang, Shinji Watanabe
-
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis,
arXiv, 2412.15322
, arxiv, pdf, cication: -1Ho Kei Cheng, Masato Ishii, Akio Hayakawa, ..., Alexander Schwing, Yuki Mitsufuji · (huggingface) · (hkchengrex) · (MMAudio - hkchengrex)
-
· (fugatto.github)
-
Tell What You Hear From What You See -- Video to Audio Generation Through Text,
arXiv, 2411.05679
, arxiv, pdf, cication: -1Xiulong Liu, Kun Su, Eli Shlizerman
-
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation,
arXiv, 2411.05141
, arxiv, pdf, cication: -1Mu Yang, Bowen Shi, Matthew Le, ..., Wei-Ning Hsu, Andros Tjandra
-
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation,
arXiv, 2410.12266
, arxiv, pdf, cication: -1Huadai Liu, Jialei Wang, Rongjie Huang, ..., Wei Xue, Zhou Zhao
-
Movie Gen: A Cast of Media Foundation Models,
arXiv, 2410.13720
, arxiv, pdf, cication: -1Adam Polyak, Amit Zohar, Andrew Brown, ..., Vladan Petrovic, Yuming Du · (ai.meta)