Demo | Paper | Weight | Dataset
This repository is the official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription. SongPrep is able to analyze the structure and lyrics of entire songs and provide precise timestamps without the need for additional source separation. In this repository, we provide the SongPrep model, inference scripts, and checkpoints trained on the Million Song Dataset that support both Chinese and English.Results are reported in Diarization Error Rate (DER) for structure parsing and Word Error Rate (WER) for lyrics transcription.
Model | #Params | WER | DER |
---|---|---|---|
SongPrep | 7B | 23.5% | 18.2% |
Gemini-2.5 | - | 29.2% | 94.6% |
Seed-ASR | 12B+ | 104.1% | - |
Qwen3-ASR | - | 33.3% | - |
Qwen-Audio | 8.4B | 232.7% | - |
You can install the necessary dependencies using the requirements.txt
file with Python>=3.8.12 and CUDA>=11.8:
pip install -r requirements.txt
If your Python<=3.9, you can use pip to install fairseq,
pip install fairseq==0.12.2 --no-deps
else it is recommended to install it on wheels. For example Python==3.11 can use liyaodev/fairseq;
pip3 install fairseq-0.12.3.1-cp311-cp311-linux_x86_64.whl
To ensure the model runs correctly, please download the weight from the original source at Hugging Face, and save it into root directory of the project.
Once everything is set up, you can run the inference script using the following command:
python3 run.py -i your_wav_path
The complete output may look like:
[structure][start:end]lyric ; [structure][start:end]lyric ; [structure][start:end]lyric
- The song is divided into segments by ';'.
- The structure is the label from structure analysis for the segment.
- The start and end are the segment’s start and end times.
- The lyric is the recognized lyrics, where sentences separated by '.'.
@misc{tan2025songpreppreprocessingframeworkendtoend,
title={SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription},
author={Wei Tan and Shun Lei and Huaicheng Zhang and Guangzheng Li and Yixuan Zhang and Hangting Chen and Jianwei Yu and Rongzhi Gu and Dong Yu},
year={2025},
eprint={2509.17404},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2509.17404},
}
The code and weights in this repository is released in the LICENSE file.