SongPrep

This repository is the official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription. SongPrep is able to analyze the structure and lyrics of entire songs and provide precise timestamps without the need for additional source separation. In this repository, we provide the SongPrep model, inference scripts, and checkpoints trained on the Million Song Dataset that support both Chinese and English.

Evaluation

Results are reported in Diarization Error Rate (DER) for structure parsing and Word Error Rate (WER) for lyrics transcription.

Model	#Params	WER	DER
SongPrep	7B	23.5%	18.2%
Gemini-2.5	-	29.2%	94.6%
Seed-ASR	12B+	104.1%	-
Qwen3-ASR	-	33.3%	-
Qwen-Audio	8.4B	232.7%	-

Installation

Start from scratch

You can install the necessary dependencies using the requirements.txt file with Python>=3.8.12 and CUDA>=11.8:

pip install -r requirements.txt

If your Python<=3.9, you can use pip to install fairseq,

pip install fairseq==0.12.2 --no-deps

else it is recommended to install it on wheels. For example Python==3.11 can use liyaodev/fairseq；

pip3 install fairseq-0.12.3.1-cp311-cp311-linux_x86_64.whl

Usage

To ensure the model runs correctly, please download the weight from the original source at Hugging Face, and save it into root directory of the project.

Once everything is set up, you can run the inference script using the following command:

python3 run.py -i your_wav_path

The complete output may look like:

[structure][start:end]lyric ; [structure][start:end]lyric ; [structure][start:end]lyric

The song is divided into segments by ';'.
The structure is the label from structure analysis for the segment.
The start and end are the segment’s start and end times.
The lyric is the recognized lyrics, where sentences separated by '.'.

Citation

@misc{tan2025songpreppreprocessingframeworkendtoend,
      title={SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription}, 
      author={Wei Tan and Shun Lei and Huaicheng Zhang and Guangzheng Li and Yixuan Zhang and Hangting Chen and Jianwei Yu and Rongzhi Gu and Dong Yu},
      year={2025},
      eprint={2509.17404},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2509.17404}, 
}

License

The code and weights in this repository is released in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
conf		conf
img		img
megatron		megatron
mucodec		mucodec
val_tools		val_tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SongPrep

Evaluation

Installation

Start from scratch

Usage

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

tencent-ailab/SongPrep

Folders and files

Latest commit

History

Repository files navigation

SongPrep

Evaluation

Installation

Start from scratch

Usage

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages