Skip to content

The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription.

License

Notifications You must be signed in to change notification settings

tencent-ailab/SongPrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SongPrep

Demo  |  Paper  |  Weight  |  Dataset

This repository is the official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription. SongPrep is able to analyze the structure and lyrics of entire songs and provide precise timestamps without the need for additional source separation. In this repository, we provide the SongPrep model, inference scripts, and checkpoints trained on the Million Song Dataset that support both Chinese and English.

Evaluation

Results are reported in Diarization Error Rate (DER) for structure parsing and Word Error Rate (WER) for lyrics transcription.

Model #Params WER DER
SongPrep 7B 23.5% 18.2%
Gemini-2.5 - 29.2% 94.6%
Seed-ASR 12B+ 104.1% -
Qwen3-ASR - 33.3% -
Qwen-Audio 8.4B 232.7% -

Installation

Start from scratch

You can install the necessary dependencies using the requirements.txt file with Python>=3.8.12 and CUDA>=11.8:

pip install -r requirements.txt

If your Python<=3.9, you can use pip to install fairseq,

pip install fairseq==0.12.2 --no-deps

else it is recommended to install it on wheels. For example Python==3.11 can use liyaodev/fairseq

pip3 install fairseq-0.12.3.1-cp311-cp311-linux_x86_64.whl

Usage

To ensure the model runs correctly, please download the weight from the original source at Hugging Face, and save it into root directory of the project.

Once everything is set up, you can run the inference script using the following command:

python3 run.py -i your_wav_path

The complete output may look like:

[structure][start:end]lyric ; [structure][start:end]lyric ; [structure][start:end]lyric
  • The song is divided into segments by ';'.
  • The structure is the label from structure analysis for the segment.
  • The start and end are the segment’s start and end times.
  • The lyric is the recognized lyrics, where sentences separated by '.'.

Citation

@misc{tan2025songpreppreprocessingframeworkendtoend,
      title={SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription}, 
      author={Wei Tan and Shun Lei and Huaicheng Zhang and Guangzheng Li and Yixuan Zhang and Hangting Chen and Jianwei Yu and Rongzhi Gu and Dong Yu},
      year={2025},
      eprint={2509.17404},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2509.17404}, 
}

License

The code and weights in this repository is released in the LICENSE file.

About

The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Transcription.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published