🕺🕺🕺 Lodge 💃💃💃
A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives (CVPR 2024)
Ronghui Li, Yuxiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu and Xiu Li
TL;DR: We propose a two-stage diffusion model that can generate extremely long dance from given music in a parallel manner.
CLICK for full abstract
We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the coarse-level music-dance correlation and production characteristic dance primitives. In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules. In addition, we propose a Foot Refine Block to optimize the contact between the feet and the ground, enhancing the physical realism of the motion. Our approach can parallelly generate dance sequences of extremely long length, striking a balance between global choreographic patterns and local motion quality and expressiveness. Extensive experiments validate the efficacy of our method.
- Release the code and config for teaser
- Release the checkpoints
- Release detailed guidance for traing and testing
- Release more applications
Our method is trained using cuda11, pytorch-lightning 1.9.5 on Nvidia A100.
conda env create -f lodge.yml
Our environment is similar to EDGE official. You may check them for more details.
The FineDance dataset lasts an average of 152.3 seconds per dance and has a wealth of 22 dance genres, making it ideal for training dance generation, especially long dance generation. Therefore, we mainly use FineDance to conduct experiments. Please visit Google Driver or 百度云 to download the origin FineDance dataset and put it in the ./data floder. Please notice that the origin FineDance motion has 52 joints (including 22 body joints and 30 hand joints), we only use the body part dance to train and test Lodge. Therefore, you need to run the following script to preprocess the dataset.
python data/code/preprocess.py
python dld/data/pre/FineDance_normalizer.py
Otherwise, directly download our preprocessed music and dance features from Google Driver or 百度云 and put them into the ./data/finedance folder if you don't wish to process the data.
The final file structure is as follows:
LODGE
├── data
│ ├── code
│ │ ├──preprocess.py
│ │ ├──extract_musicfea35.py
│ ├── finedance
│ │ ├──label_json
│ │ ├──motion
│ │ ├──music_npy
│ │ ├──music_wav
│ │ ├──music_npynew
│ │ ├──mofea319
│ │── Normalizer.pth
└ └── smplx_neu_J_1.npy
Traing the Local Diffusion and Global Diffusion
python train.py --cfg configs/lodge/finedance_fea139.yaml --cfg_assets configs/data/assets.yaml
python train.py --cfg configs/lodge/coarse_finedance_fea139.yaml --cfg_assets configs/data/assets.yaml
Set the pretrained Local Diffusion checkpoint path at the "TRAIN.PRETRAINED" of "configs/lodge/finedance_fea139_finetune_v2.yaml", then finetuning the Local Diffusion for smooth generation.
python train.py --cfg configs/lodge/finedance_fea139_finetune_v2.yaml --cfg_assets configs/data/assets.yaml
You can also download the pretrained model from Google Driver or 百度云.
Once the training is done, run inference: The --soft is a float parameter range from 0 to 1, which can set the number of steps for the soft cue guidance action.
python infer_lodge.py --cfg exp/Local_Module/FineDance_FineTuneV2_Local/local_train.yaml --cfg_assets configs/data/assets.yaml --soft 1.0
python render.py --modir 'your motion dir'
Once the inference is done, run evaluate:
python metric/metrics_finedance.py
python metric/beat_align_score.py
python metric/foot_skating.py
If you think this project is helpful, please leave a star⭐️⭐️⭐️ and cite our paper:
@inproceedings{li2024lodge,
title={Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives},
author={Li, Ronghui and Zhang, Yuxiang and Zhang, Yachao and Zhang, Hongwen and Guo, Jie and Zhang, Yan and Liu, Yebin and Li, Xiu},
booktitle={IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}
@inproceedings{li2023finedance,
title={FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation},
author={Li, Ronghui and Zhao, Junfan and Zhang, Yachao and Su, Mingyang and Ren, Zeping and Zhang, Han and Tang, Yansong and Li, Xiu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages={10234--10243},
year={2023}
}
This basic dance diffusion borrows from EDGE, the evaluate code borrows from Bailando, the README.md style borrows from follow-your-pose. Thanks the authors for sharing their code and models.