Skip to content

Commit

Permalink
Merge branch 'database-search' of github.com:qingen/PaddleSpeech into…
Browse files Browse the repository at this point in the history
… database-search
  • Loading branch information
qingen committed Mar 28, 2022
2 parents 2f4aa3c + eb56675 commit 612ba54
Show file tree
Hide file tree
Showing 19 changed files with 697 additions and 18 deletions.
62 changes: 62 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

<h3>
<a href="#quick-start"> Quick Start </a>
| <a href="#quick-start-server"> Quick Start Server </a>
| <a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a>
</div>
Expand Down Expand Up @@ -178,6 +179,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
<!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech.
- 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verfication.
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.

Expand All @@ -203,6 +206,11 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl
paddlespeech cls --input input.wav
```

**Speaker Verification**
```
paddlespeech vector --task spk --input input_16k.wav
```

**Automatic Speech Recognition**
```shell
paddlespeech asr --lang zh --input input_16k.wav
Expand Down Expand Up @@ -242,6 +250,36 @@ For more command lines, please see: [demos](https://github.com/PaddlePaddle/Padd

If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).


<a name="quickstartserver"></a>
## Quick Start Server

Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).

**Start server**
```shell
paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
```

**Access Speech Recognition Services**
```shell
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
```

**Access Text to Speech Services**
```shell
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```

**Access Audio Classification Services**
```shell
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
```


For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)


## Model List

PaddleSpeech supports a series of most popular models. They are summarized in [released models](./docs/source/released_model.md) and attached with available pretrained models.
Expand Down Expand Up @@ -458,6 +496,29 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tbody>
</table>

**Speaker Verification**

<table style="width:100%">
<thead>
<tr>
<th> Task </th>
<th> Dataset </th>
<th> Model Type </th>
<th> Link </th>
</tr>
</thead>
<tbody>
<tr>
<td>Speaker Verification</td>
<td>VoxCeleb12</td>
<td>ECAPA-TDNN</td>
<td>
<a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
</td>
</tr>
</tbody>
</table>

**Punctuation Restoration**

<table style="width:100%">
Expand Down Expand Up @@ -499,6 +560,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
- [Chinese Rule Based Text Frontend](./docs/source/tts/zh_text_frontend.md)
- [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)
- [Audio Classification](./demos/audio_tagging/README.md)
- [Speaker Verification](./demos/speaker_verification/README.md)
- [Speech Translation](./demos/speech_translation/README.md)
- [Released Models](./docs/source/released_model.md)
- [Community](#Community)
Expand Down
71 changes: 69 additions & 2 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

<h3>
<a href="#quick-start"> 快速开始 </a>
| <a href="#quick-start-server"> 快速使用服务 </a>
| <a href="#documents"> 教程文档 </a>
| <a href="#model-list"> 模型列表 </a>
</div>
Expand Down Expand Up @@ -179,7 +180,9 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
<!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 🤗 2021.12.14: 我们在 Hugging Face Spaces 上的 [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) 以及 [TTS](https://huggingface.co/spaces/akhaliq/paddlespeech) Demos 上线啦!
- 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。
- 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: PaddleSpeech CLI 上线!覆盖了声音分类、语音识别、语音翻译(英译中)以及语音合成。

### 技术交流群
Expand All @@ -202,6 +205,10 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
```shell
paddlespeech cls --input input.wav
```
**声纹识别**
```shell
paddlespeech vector --task spk --input input_16k.wav
```
**语音识别**
```shell
paddlespeech asr --lang zh --input input_16k.wav
Expand Down Expand Up @@ -236,6 +243,33 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
> Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md)[语音合成](./docs/source/tts/quick_start.md)

## 快速使用服务
安装完成后,开发者可以通过命令行快速使用服务。

**启动服务**
```shell
paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
```

**访问语音识别服务**
```shell
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
```

**访问语音合成服务**
```shell
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```

**访问音频分类服务**
```shell
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
```

更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)


## 模型列表
PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)

Expand Down Expand Up @@ -453,6 +487,30 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody>
</table>


**声纹识别**

<table style="width:100%">
<thead>
<tr>
<th> Task </th>
<th> Dataset </th>
<th> Model Type </th>
<th> Link </th>
</tr>
</thead>
<tbody>
<tr>
<td>Speaker Verification</td>
<td>VoxCeleb12</td>
<td>ECAPA-TDNN</td>
<td>
<a href = "./examples/voxceleb/sv0">ecapa-tdnn-voxceleb12</a>
</td>
</tr>
</tbody>
</table>

**标点恢复**

<table style="width:100%">
Expand Down Expand Up @@ -499,6 +557,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- [中文文本前端](./docs/source/tts/zh_text_frontend.md)
- [测试语音样本](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)
- [声音分类](./demos/audio_tagging/README_cn.md)
- [声纹识别](./demos/speaker_verification/README_cn.md)
- [语音翻译](./demos/speech_translation/README_cn.md)
- [模型列表](#模型列表)
- [语音识别](#语音识别模型)
Expand All @@ -521,6 +580,15 @@ author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}},
year={2021}
}
@inproceedings{zheng2021fused,
title={Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation},
author={Zheng, Renjie and Chen, Junkun and Ma, Mingbo and Huang, Liang},
booktitle={International Conference on Machine Learning},
pages={12736--12746},
year={2021},
organization={PMLR}
}
```

<a name="欢迎贡献"></a>
Expand Down Expand Up @@ -568,7 +636,6 @@ year={2021}
## 致谢

- 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) 多年来的关注和建议,以及在诸多问题上的帮助。
- 非常感谢 [AK391](https://github.com/AK391) 在 Huggingface Spaces 上使用 Gradio 对我们的语音合成功能进行网页版演示。
- 非常感谢 [mymagicpower](https://github.com/mymagicpower) 采用PaddleSpeech 对 ASR 的[短语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk)[长语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk)进行 Java 实现。
- 非常感谢 [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) 采用 PaddleSpeech 语音合成功能实现 Virtual Uploader(VUP)/Virtual YouTuber(VTuber) 虚拟主播。
- 非常感谢 [745165806](https://github.com/745165806)/[PaddleSpeechTask](https://github.com/745165806/PaddleSpeechTask) 贡献标点重建相关模型。
Expand Down
Loading

0 comments on commit 612ba54

Please sign in to comment.