Skip to content

Commit

Permalink
Merge branch 'database-search' of github.com:qingen/PaddleSpeech into…
Browse files Browse the repository at this point in the history
… database-search
  • Loading branch information
qingen committed Mar 22, 2022
2 parents 096e415 + b531565 commit 43d1dff
Show file tree
Hide file tree
Showing 206 changed files with 17,954 additions and 2,194 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,13 @@ repos:
entry: bash .pre-commit-hooks/clang-format.hook -i
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|cuh|proto)$
exclude: (?=speechx/speechx/kaldi).*(\.cpp|\.cc|\.h|\.py)$
exclude: (?=speechx/speechx/kaldi|speechx/patch).*(\.cpp|\.cc|\.h|\.py)$
- id: copyright_checker
name: copyright_checker
entry: python .pre-commit-hooks/copyright-check.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
exclude: (?=third_party|pypinyin|speechx/speechx/kaldi).*(\.cpp|\.cc|\.h|\.py)$
exclude: (?=third_party|pypinyin|speechx/speechx/kaldi|speechx/patch).*(\.cpp|\.cc|\.h|\.py)$
- repo: https://github.com/asottile/reorder_python_imports
rev: v2.4.0
hooks:
Expand Down
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
# Changelog
Date: 2022-3-08, Author: yt605155624.
Add features to: T2S:
- Add aishell3 hifigan egs.
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1545

Date: 2022-3-08, Author: yt605155624.
Add features to: T2S:
- Add vctk hifigan egs.
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1544

Date: 2022-1-29, Author: yt605155624.
Add features to: T2S:
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
<!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/akhaliq/paddlespeech) Demos on Hugging Face Spaces are available!
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.

### Community
Expand Down Expand Up @@ -207,6 +207,7 @@ paddlespeech cls --input input.wav
```shell
paddlespeech asr --lang zh --input input_16k.wav
```
- web demo for Automatic Speech Recognition is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [ASR Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR)

**Speech Translation** (English to Chinese)
(not support for Mac and Windows now)
Expand All @@ -218,7 +219,7 @@ paddlespeech st --input input_16k.wav
```shell
paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav
```
- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/akhaliq/paddlespeech)
- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)

**Text Postprocessing**
- Punctuation Restoration
Expand Down Expand Up @@ -397,9 +398,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td >HiFiGAN</td>
<td >CSMSC</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td>
<a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -573,7 +574,6 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P


- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
- Many thanks to [AK391](https://github.com/AK391) for TTS web demo on Huggingface Spaces using Gradio.
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
- Many thanks to [745165806](https://github.com/745165806)/[PaddleSpeechTask](https://github.com/745165806/PaddleSpeechTask) for contributing Punctuation Restoration model.
Expand Down
4 changes: 2 additions & 2 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -392,9 +392,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >HiFiGAN</td>
<td >CSMSC</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td>
<a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
</td>
</tr>
<tr>
Expand Down
7 changes: 5 additions & 2 deletions demos/speech_recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,5 +84,8 @@ Here is a list of pretrained models released by PaddleSpeech that can be used by

| Model | Language | Sample Rate
| :--- | :---: | :---: |
| conformer_wenetspeech| zh| 16000
| transformer_librispeech| en| 16000
| conformer_wenetspeech| zh| 16k
| transformer_librispeech| en| 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
|deepspeech2offline_librispeech|en| 16k
7 changes: 5 additions & 2 deletions demos/speech_recognition/README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee

| 模型 | 语言 | 采样率
| :--- | :---: | :---: |
| conformer_wenetspeech| zh| 16000
| transformer_librispeech| en| 16000
| conformer_wenetspeech | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
1 change: 1 addition & 0 deletions demos/speech_server/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.wav
65 changes: 60 additions & 5 deletions demos/speech_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,21 +110,22 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
import json

asrclient_executor = ASRClientExecutor()
asrclient_executor(
res = asrclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
print(res.json())
```

Output:
```bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
time cost 0.604353 s.
```

### 5. TTS Client Usage
Expand All @@ -146,7 +147,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
- `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
- `sample_rate`: Sampling rate, choice: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: `output.wav`.
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.

Output:
```bash
Expand All @@ -160,9 +161,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
import json
ttsclient_executor = TTSClientExecutor()
ttsclient_executor(
res = ttsclient_executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8090,
Expand All @@ -171,14 +173,64 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
volume=1.0,
sample_rate=0,
output="./output.wav")
response_dict = res.json()
print(response_dict["message"])
print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
print("Audio duration: %f s." %(response_dict['result']['duration']))
```

Output:
```bash
{'description': 'success.'}
Save synthesized audio successfully on ./output.wav.
Audio duration: 3.612500 s.
Response time: 0.388317 s.
```

### 6. CLS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
Usage:
```bash
paddlespeech_client cls --help
```
Arguments:
- `server_ip`: server ip. Default: 127.0.0.1
- `port`: server port. Default: 8090
- `input`(required): Audio file to be classified.
- `topk`: topk scores of classification result.

Output:
```bash
[2022-03-09 20:44:39,974] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
[2022-03-09 20:44:39,975] [ INFO] - Response time 0.104360 s.


```

- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
import json

clsclient_executor = CLSClientExecutor()
res = clsclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
topk=1)
print(res.json())
```

Output:
```bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}

```

Expand All @@ -189,3 +241,6 @@ Get all models supported by the ASR service via `paddlespeech_server stats --tas

### TTS model
Get all models supported by the TTS service via `paddlespeech_server stats --task tts`, where static models can be used for paddle inference inference.

### CLS model
Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
71 changes: 64 additions & 7 deletions demos/speech_server/README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee

```

### 4. ASR客户端使用方法
### 4. ASR 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)
```
Expand Down Expand Up @@ -111,25 +111,26 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
import json
asrclient_executor = ASRClientExecutor()
asrclient_executor(
res = asrclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
print(res.json())
```

输出:
```bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
time cost 0.604353 s.
```

### 5. TTS客户端使用方法
### 5. TTS 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)

Expand All @@ -150,7 +151,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0
- `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
- `sample_rate`: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值:0
- `output`: 输出音频的路径, 默认值:output.wav
- `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地

输出:
```bash
Expand All @@ -163,9 +164,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
import json
ttsclient_executor = TTSClientExecutor()
ttsclient_executor(
res = ttsclient_executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8090,
Expand All @@ -174,20 +176,75 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
volume=1.0,
sample_rate=0,
output="./output.wav")
response_dict = res.json()
print(response_dict["message"])
print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
print("Audio duration: %f s." %(response_dict['result']['duration']))
```

输出:
```bash
{'description': 'success.'}
Save synthesized audio successfully on ./output.wav.
Audio duration: 3.612500 s.
Response time: 0.388317 s.
```

### 5. CLS 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)
```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
使用帮助:
```bash
paddlespeech_client cls --help
```
参数:
- `server_ip`: 服务端ip地址,默认: 127.0.0.1。
- `port`: 服务端口,默认: 8090。
- `input`(必须输入): 用于分类的音频文件。
- `topk`: 分类结果的topk。

输出:
```bash
[2022-03-09 20:44:39,974] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
[2022-03-09 20:44:39,975] [ INFO] - Response time 0.104360 s.


```

- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
import json

clsclient_executor = CLSClientExecutor()
res = clsclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
topk=1)
print(res.json())

```

输出:
```bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}

```


## 服务支持的模型
### ASR支持的模型
通过 `paddlespeech_server stats --task asr` 获取ASR服务支持的所有模型,其中静态模型可用于 paddle inference 推理。

### TTS支持的模型
通过 `paddlespeech_server stats --task tts` 获取TTS服务支持的所有模型,其中静态模型可用于 paddle inference 推理。

### CLS支持的模型
通过 `paddlespeech_server stats --task cls` 获取CLS服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
4 changes: 4 additions & 0 deletions demos/speech_server/cls_client.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1
Loading

0 comments on commit 43d1dff

Please sign in to comment.