Skip to content

Commit

Permalink
update AudioLDM2Pipeline demo (PaddlePaddle#878)
Browse files Browse the repository at this point in the history
Co-authored-by: luyao-cv <[email protected]>
  • Loading branch information
swagger-coder and luyao-cv authored Dec 18, 2024
1 parent 5c20d96 commit be7c973
Show file tree
Hide file tree
Showing 21 changed files with 70 additions and 60 deletions.
35 changes: 30 additions & 5 deletions paddlemix/examples/diffsinger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,48 @@

[DiffSinger](https://arxiv.org/abs/2105.02446) 是目前最先进的歌声合成(Singing Voice Synthesis, SVS)模型。OpenVPI 维护的版本对其进行了进一步优化,增加了更多的功能。

本仓库目前仅支持推理功能
本仓库目前仅支持推理功能,后续会对该仓库进一步完善说明,


## 2. 环境准备

1) [安装PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP?tab=readme-ov-file#%E5%AE%89%E8%A3%85)

2)[安装 PaddleMix 环境依赖包](https://github.com/PaddlePaddle/PaddleMIX/tree/b4f97ff859e1964c839fc5fab94f7ba63b1e5959?tab=readme-ov-file#%E5%AE%89%E8%A3%85)
1)[安装 PaddleMix 环境依赖包](https://github.com/PaddlePaddle/PaddleMIX/tree/b4f97ff859e1964c839fc5fab94f7ba63b1e5959?tab=readme-ov-file#%E5%AE%89%E8%A3%85)

3)使用 pip 安装依赖:
```bash
pip install -r requirements.txt
```

## 4. 快速开始
完成环境准备后,运行以下脚本:
1)完成环境准备后,下载权重至`PaddleMIX/paddlemix/examples/diffsinger/openvpi`
```bash
cd paddlemix/examples/diffsinger/

wget https://paddlenlp.bj.bcebos.com/models/community/paddlemix/openvpi.tar

tar -xvf openvpi.tar

```

然后运行以下脚本:

```bash
bash run_predict.sh
```


## 5. Demo
点击下载音频进行试听~
<div align = "center">
<thead>
</thead>
<tbody>
<tr>
<td align = "center">
<a href="https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio/00_我多想说再见啊.wav" rel="nofollow">
<img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
</td>
</tr>
</tbody>
</div>
</details>
2 changes: 1 addition & 1 deletion paddlemix/examples/diffsinger/run_predict.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

CUDA_VISIBLE_DEVICES=4 python run_predict.py --proj ./samples/00_我多想说再见啊.ds --exp openvpi/diffsinger_xxx
python run_predict.py --proj ./samples/00_我多想说再见啊.ds --exp openvpi/opencpop
1 change: 0 additions & 1 deletion paddlemix/models/diffsinger/inference/ds_acoustic.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@

import numpy as np

# import paddle_aux
import paddle
import tqdm

Expand Down
13 changes: 0 additions & 13 deletions paddlemix/models/diffsinger/modules/__init__.py

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import sys

import paddle
import paddle_aux
from paddlemix.models.diffsinger.utils import paddle_aux

from paddlemix.models.diffsinger.utils import filter_kwargs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from typing import Optional

import paddle
import paddle_aux
from paddlemix.models.diffsinger.utils import paddle_aux


class ConvNeXtBlock(paddle.nn.Layer):
Expand Down
6 changes: 3 additions & 3 deletions paddlemix/models/diffsinger/modules/backbones/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@
# limitations under the License.

import paddle
from modules.backbones.lynxnet import LYNXNet
from modules.backbones.wavenet import WaveNet
from utils import filter_kwargs
from paddlemix.models.diffsinger.modules.backbones.lynxnet import LYNXNet
from paddlemix.models.diffsinger.modules.backbones.wavenet import WaveNet
from paddlemix.models.diffsinger.utils import filter_kwargs

BACKBONES = {"wavenet": WaveNet, "lynxnet": LYNXNet}

Expand Down
2 changes: 1 addition & 1 deletion paddlemix/models/diffsinger/modules/backbones/lynxnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import sys

import paddle
import paddle_aux
from paddlemix.models.diffsinger.utils import paddle_aux

from paddlemix.models.diffsinger.modules.commons.common_layers import SinusoidalPosEmb
from paddlemix.models.diffsinger.utils.hparams import hparams
Expand Down
2 changes: 1 addition & 1 deletion paddlemix/models/diffsinger/modules/backbones/wavenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from math import sqrt

import paddle
import paddle_aux
from paddlemix.models.diffsinger.utils import paddle_aux

from paddlemix.models.diffsinger.modules.commons.common_layers import SinusoidalPosEmb
from paddlemix.models.diffsinger.utils.hparams import hparams
Expand Down
5 changes: 2 additions & 3 deletions paddlemix/models/diffsinger/modules/commons/common_layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,13 @@
import sys

import paddle
import paddle_aux
from paddlemix.models.diffsinger.utils import paddle_aux
from paddle.nn import GELU, LayerNorm
from paddle.nn import MultiHeadAttention as MultiheadAttention
from paddle.nn import ReLU
from paddle.nn import Silu as SiLU

sys.path.append("/mnt/data2/pengfeiyue/code/Paddle_test/DiffSinger_paddle")
import utils
import paddlemix.models.diffsinger.utils as utils


class NormalInitEmbedding(paddle.nn.Embedding):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
import sys

import paddle
import paddle_aux

from paddlemix.models.diffsinger.utils import paddle_aux


class PositionalEncoding(paddle.nn.Layer):
Expand Down
8 changes: 3 additions & 5 deletions paddlemix/models/diffsinger/modules/core/ddpm.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,18 @@

from __future__ import annotations

import sys
import sys, os
from collections import deque
from functools import partial
from typing import List, Tuple

import numpy as np

# import paddle_aux
import paddle
from tqdm import tqdm

sys.path.append("/mnt/data2/pengfeiyue/code/Paddle_test/DiffSinger_paddle")
from modules.backbones import build_backbone
from utils.hparams import hparams
from paddlemix.models.diffsinger.modules.backbones import build_backbone
from paddlemix.models.diffsinger.utils.hparams import hparams


def extract(a, t, x_shape):
Expand Down
6 changes: 2 additions & 4 deletions paddlemix/models/diffsinger/modules/core/reflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,12 @@

import sys
from typing import List, Tuple

import paddle
import paddle_aux
from tqdm import tqdm

from tqdm import tqdm
from paddlemix.models.diffsinger.modules.backbones import build_backbone
from paddlemix.models.diffsinger.utils.hparams import hparams

from paddlemix.models.diffsinger.utils import paddle_aux

class RectifiedFlow(paddle.nn.Layer):
def __init__(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
import sys

import paddle
import paddle_aux

from paddlemix.models.diffsinger.utils import paddle_aux
from paddlemix.models.diffsinger.modules.commons.common_layers import (
EncSALayer,
SinusoidalPositionalEmbedding,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
import sys

import paddle
import paddle_aux

from paddlemix.models.diffsinger.utils import paddle_aux
from paddlemix.models.diffsinger.modules.commons.common_layers import (
NormalInitEmbedding as Embedding,
)
Expand Down
4 changes: 2 additions & 2 deletions paddlemix/models/diffsinger/modules/nsf_hifigan/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@
import json
import pathlib
import sys

import numpy as np
import paddle
import paddle.nn.functional as F
import paddle_aux

from paddlemix.models.diffsinger.utils import paddle_aux
from paddle.nn.utils import remove_weight_norm, weight_norm

from .env import AttrDict
Expand Down
5 changes: 2 additions & 3 deletions paddlemix/models/diffsinger/modules/vocoders/ddsp.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,16 @@

import pathlib
import sys

import numpy as np
import paddle
import paddle_aux
import yaml

from librosa.filters import mel as librosa_mel_fn

from paddlemix.models.diffsinger.basics.base_vocoder import BaseVocoder
from paddlemix.models.diffsinger.modules.vocoders.registry import register_vocoder
from paddlemix.models.diffsinger.utils.hparams import hparams

from paddlemix.models.diffsinger.utils import paddle_aux

class DotDict(dict):
def __getattr__(*args):
Expand Down
3 changes: 1 addition & 2 deletions paddlemix/models/diffsinger/modules/vocoders/nsf_hifigan.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,9 @@

import pathlib
import sys

import paddle
import paddle_aux

from paddlemix.models.diffsinger.utils import paddle_aux
from paddlemix.models.diffsinger.basics.base_vocoder import BaseVocoder
from paddlemix.models.diffsinger.modules.nsf_hifigan.models import load_model
from paddlemix.models.diffsinger.modules.vocoders.registry import register_vocoder
Expand Down
3 changes: 0 additions & 3 deletions paddlemix/models/diffsinger/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,6 @@
from paddlemix.models.diffsinger.utils import paddle_aux
from paddlemix.models.diffsinger.utils.hparams import hparams

# import paddle_aux


def tensors_to_scalars(metrics):
new_metrics = {}
for k, v in metrics.items():
Expand Down
5 changes: 4 additions & 1 deletion paddlemix/models/diffsinger/utils/hparams.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def set_hparams(config="", exp_name="", hparams_str="", print_hparams=True, glob
args_work_dir = ""
if args.exp_name != "":
args.work_dir = args.exp_name
args_work_dir = os.path.join("checkpoint", args.work_dir)
args_work_dir = args.exp_name

config_chains = []
loaded_config = set()
Expand Down Expand Up @@ -136,6 +136,8 @@ def dump_hparams():
hparams["exp_name"] = args.exp_name
if hparams_.get("exp_name") is None:
hparams_["exp_name"] = args.exp_name

hparams["vocoder_ckpt"] = os.path.join(os.path.dirname(args.exp_name), hparams_["vocoder_ckpt"])

# @rank_zero_only
def print_out_hparams():
Expand All @@ -150,4 +152,5 @@ def print_out_hparams():

print_out_hparams()


return hparams_
19 changes: 12 additions & 7 deletions ppdiffusers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ python setup.py install
```
### 设置代理
```shell
export HF_HUB_ENABLE_HF_TRANSFER=1
export HF_HUB_ENABLE_HF_TRANSFER=1
export HF_ENDPOINT=https://hf-mirror.com
```

Expand Down Expand Up @@ -1015,14 +1015,15 @@ imageio.mimsave("text_to_video_generation-zero-result-panda.mp4", result, fps=4)
import paddle
import scipy

from ppdiffusers import AudioLDMPipeline
from ppdiffusers import AudioLDM2Pipeline

pipe = AudioLDMPipeline.from_pretrained("cvssp/audioldm", paddle_dtype=paddle.float16)
pipe = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2", paddle_dtype=paddle.float16)

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
prompt = "Musical constellations twinkling in the night sky, forming a cosmic melody."
negative_prompt = "Low quality."
audio = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=200, audio_length_in_s=10).audios[0]

output_path = "text_to_audio_generation-audio_ldm-techno.wav"
output_path = f"{prompt}.wav"
# save the audio sample as a .wav file
scipy.io.wavfile.write(output_path, rate=16000, data=audio)
```
Expand All @@ -1032,14 +1033,18 @@ scipy.io.wavfile.write(output_path, rate=16000, data=audio)
<tbody>
<tr>
<td align = "center">
<a href="https://paddlenlp.bj.bcebos.com/models/community/westfish/develop_ppdiffusers_data/techno.wav" rel="nofollow">
<a href="https://paddlenlp.bj.bcebos.com/models/community/paddlemix/ppdiffusers/AudioLDM2-Music.wav" rel="nofollow">
<img align="center" src="https://user-images.githubusercontent.com/20476674/209344877-edbf1c24-f08d-4e3b-88a4-a27e1fd0a858.png" width="200 style="max-width: 100%;"></a><br>
</td>
</tr>
</tbody>
</div>
</details>

可以使用以下代码转换[huggingface](https://huggingface.co/docs/diffusers/api/pipelines/audioldm2)的模型,一键在paddle中使用
```python
pipe = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2-music", from_hf_hub=True, from_diffusers=True).save_pretrained("cvssp/audioldm2-music")
```
### 图像

<details><summary>&emsp;无条件图像生成(Unconditional Image Generation)</summary>
Expand Down

0 comments on commit be7c973

Please sign in to comment.