Merge pull request #1554 from lym0302/develop

[server] add server cls
PaddlePaddle · Mar 10, 2022 · 90deeca · 90deeca
2 parents baec77e + 89457b2
commit 90deeca
Show file tree

Hide file tree

Showing 27 changed files with 982 additions and 205 deletions.
diff --git a/demos/speech_server/README.md b/demos/speech_server/README.md
@@ -110,21 +110,22 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
   ```python
   from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
+  import json
 
   asrclient_executor = ASRClientExecutor()
-  asrclient_executor(
+  res = asrclient_executor(
       input="./zh.wav",
       server_ip="127.0.0.1",
       port=8090,
       sample_rate=16000,
       lang="zh_cn",
       audio_format="wav")
+  print(res.json())
   ```
 
   Output:
   ```bash
   {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
-  time cost 0.604353 s.
   ```
 
 ### 5. TTS Client Usage
@@ -146,7 +147,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
     - `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
     - `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
     - `sample_rate`: Sampling rate, choice: [0, 8000, 16000], the default is the same as the model. Default: 0
-    - `output`: Output wave filepath. Default: `output.wav`.
+    - `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
 
     Output:
     ```bash
@@ -160,9 +161,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
   ```python
   from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
+  import json
 
   ttsclient_executor = TTSClientExecutor()
-  ttsclient_executor(
+  res = ttsclient_executor(
       input="您好，欢迎使用百度飞桨语音合成服务。",
       server_ip="127.0.0.1",
       port=8090,
@@ -171,14 +173,64 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
       volume=1.0,
       sample_rate=0,
       output="./output.wav")
+
+  response_dict = res.json()
+  print(response_dict["message"])
+  print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
+  print("Audio duration: %f s." %(response_dict['result']['duration']))
   ```
 
   Output:
   ```bash
   {'description': 'success.'}
   Save synthesized audio successfully on ./output.wav.
   Audio duration: 3.612500 s.
-  Response time: 0.388317 s.
+
+  ```
+
+### 6. CLS Client Usage
+**Note:** The response time will be slightly longer when using the client for the first time
+- Command Line (Recommended)
+   ```
+   paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+   ```
+
+  Usage:
+  
+  ```bash
+  paddlespeech_client cls --help
+  ```
+  Arguments:
+  - `server_ip`: server ip. Default: 127.0.0.1
+  - `port`: server port. Default: 8090
+  - `input`(required): Audio file to be classified.
+  - `topk`: topk scores of classification result.
+
+  Output:
+  ```bash
+  [2022-03-09 20:44:39,974] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
+  [2022-03-09 20:44:39,975] [    INFO] - Response time 0.104360 s.
+
+
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
+  import json
+
+  clsclient_executor = CLSClientExecutor()
+  res = clsclient_executor(
+      input="./zh.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      topk=1)
+  print(res.json())
+  ```
+
+  Output:
+  ```bash
+  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
 
   ```
 
@@ -189,3 +241,6 @@ Get all models supported by the ASR service via `paddlespeech_server stats --tas
 
 ### TTS model
 Get all models supported by the TTS service via `paddlespeech_server stats --task tts`, where static models can be used for paddle inference inference.
+
+### CLS model
+Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
diff --git a/demos/speech_server/README_cn.md b/demos/speech_server/README_cn.md
@@ -80,7 +80,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 
   ```
 
-### 4. ASR客户端使用方法
+### 4. ASR 客户端使用方法
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
    ```
@@ -111,25 +111,26 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
   ```python
   from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
+  import json
 
   asrclient_executor = ASRClientExecutor()
-  asrclient_executor(
+  res = asrclient_executor(
       input="./zh.wav",
       server_ip="127.0.0.1",
       port=8090,
       sample_rate=16000,
       lang="zh_cn",
       audio_format="wav")
+  print(res.json())
   ```
 
   输出:
   ```bash
   {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
-  time cost 0.604353 s.
 
   ```
 
-### 5. TTS客户端使用方法
+### 5. TTS 客户端使用方法
 **注意：** 初次使用客户端时响应时间会略长
 - 命令行 (推荐使用)
 
@@ -150,7 +151,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
     - `speed`: 音频速度，该值应设置在 0 到 3 之间。 默认值：1.0
     - `volume`: 音频音量，该值应设置在 0 到 3 之间。 默认值： 1.0
     - `sample_rate`: 采样率，可选 [0, 8000, 16000]，默认与模型相同。 默认值：0
-    - `output`: 输出音频的路径， 默认值：output.wav。
+    - `output`: 输出音频的路径， 默认值：None，表示不保存音频到本地。
 
     输出:
     ```bash
@@ -163,9 +164,10 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
 - Python API
   ```python
   from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
+  import json
 
   ttsclient_executor = TTSClientExecutor()
-  ttsclient_executor(
+  res = ttsclient_executor(
       input="您好，欢迎使用百度飞桨语音合成服务。",
       server_ip="127.0.0.1",
       port=8090,
@@ -174,20 +176,75 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
       volume=1.0,
       sample_rate=0,
       output="./output.wav")
+
+  response_dict = res.json()
+  print(response_dict["message"])
+  print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
+  print("Audio duration: %f s." %(response_dict['result']['duration']))
   ```
 
   输出:
   ```bash
   {'description': 'success.'}
   Save synthesized audio successfully on ./output.wav.
   Audio duration: 3.612500 s.
-  Response time: 0.388317 s.
 
   ```
 
+  ### 5. CLS 客户端使用方法
+  **注意：** 初次使用客户端时响应时间会略长
+  - 命令行 (推荐使用)
+   ```
+   paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
+   ```
+
+  使用帮助:
+  
+  ```bash
+  paddlespeech_client cls --help
+  ```
+  参数:
+  - `server_ip`: 服务端ip地址，默认: 127.0.0.1。
+  - `port`: 服务端口，默认: 8090。
+  - `input`(必须输入): 用于分类的音频文件。
+  - `topk`: 分类结果的topk。
+
+  输出:
+  ```bash
+  [2022-03-09 20:44:39,974] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
+  [2022-03-09 20:44:39,975] [    INFO] - Response time 0.104360 s.
+
+
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
+  import json
+
+  clsclient_executor = CLSClientExecutor()
+  res = clsclient_executor(
+      input="./zh.wav",
+      server_ip="127.0.0.1",
+      port=8090,
+      topk=1)
+  print(res.json())
+
+  ```
+
+  输出:
+  ```bash
+  {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
+
+  ```
+
+
 ## 服务支持的模型
 ### ASR支持的模型
 通过 `paddlespeech_server stats --task asr` 获取ASR服务支持的所有模型，其中静态模型可用于 paddle inference 推理。 
 
 ### TTS支持的模型
 通过 `paddlespeech_server stats --task tts` 获取TTS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
+
+### CLS支持的模型
+通过 `paddlespeech_server stats --task cls` 获取CLS服务支持的所有模型，其中静态模型可用于 paddle inference 推理。
diff --git a/demos/speech_server/cls_client.sh b/demos/speech_server/cls_client.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1
diff --git a/demos/speech_server/conf/application.yaml b/demos/speech_server/conf/application.yaml
@@ -9,12 +9,14 @@ port: 8090
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']
 
-engine_list: ['asr_python', 'tts_python']
+engine_list: ['asr_python', 'tts_python', 'cls_python']
 
 
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
+
+################################### ASR #########################################
 ################### speech task: asr; engine_type: python #######################
 asr_python:
     model: 'conformer_wenetspeech'
@@ -46,6 +48,7 @@ asr_inference:
         summary: True  # False -> do not show predictor config
 
 
+################################### TTS #########################################
 ################### speech task: tts; engine_type: python #######################
 tts_python: 
     # am (acoustic model) choices=['speedyspeech_csmsc', 'fastspeech2_csmsc', 
@@ -105,3 +108,30 @@ tts_inference:
     # others
     lang: 'zh'
 
+
+################################### CLS #########################################
+################### speech task: cls; engine_type: python #######################
+cls_python:
+    # model choices=['panns_cnn14', 'panns_cnn10', 'panns_cnn6']
+    model: 'panns_cnn14'
+    cfg_path: # [optional] Config of cls task.
+    ckpt_path: # [optional] Checkpoint file of model.
+    label_file: # [optional] Label file of cls task.
+    device:  # set 'gpu:id' or 'cpu'
+
+
+################### speech task: cls; engine_type: inference #######################
+cls_inference:
+    # model_type choices=['panns_cnn14', 'panns_cnn10', 'panns_cnn6']
+    model_type: 'panns_cnn14' 
+    cfg_path: 
+    model_path:  # the pdmodel file of am static model [optional]
+    params_path:  # the pdiparams file of am static model [optional]
+    label_file:  # [optional] Label file of cls task.
+
+    predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
diff --git a/docs/source/released_model.md b/docs/source/released_model.md
@@ -59,17 +59,17 @@ WaveRNN | CSMSC |[WaveRNN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tr
 
 ### Voice Cloning
 Model Type | Dataset| Example Link | Pretrained Models
-:-------------:| :------------:| :-----: | :-----:
+:-------------:| :------------:| :-----: | :-----: | 
 GE2E| AISHELL-3, etc. |[ge2e](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/ge2e)|[ge2e_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip)
 GE2E + Tactron2| AISHELL-3 |[ge2e-tactron2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc0)|[tacotron2_aishell3_ckpt_vc0_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_aishell3_ckpt_vc0_0.2.0.zip)
 GE2E + FastSpeech2 | AISHELL-3  |[ge2e-fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/vc1)|[fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip)
 
 
 ## Audio Classification Models
 
-Model Type | Dataset| Example Link | Pretrained Models
-:-------------:| :------------:| :-----: | :-----:
-PANN | Audioset| [audioset_tagging_cnn](https://github.com/qiuqiangkong/audioset_tagging_cnn) | [panns_cnn6.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn6.pdparams), [panns_cnn10.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn10.pdparams), [panns_cnn14.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn14.pdparams)
+Model Type | Dataset| Example Link | Pretrained Models | Static Models 
+:-------------:| :------------:| :-----: | :-----: | :-----:
+PANN | Audioset| [audioset_tagging_cnn](https://github.com/qiuqiangkong/audioset_tagging_cnn) | [panns_cnn6.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn6.pdparams), [panns_cnn10.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn10.pdparams), [panns_cnn14.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn14.pdparams) | [panns_cnn6_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn6_static.tar.gz)(18M), [panns_cnn10_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn10_static.tar.gz)(19M), [panns_cnn14_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn14_static.tar.gz)(289M) 
 PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn6.tar.gz), [esc50_cnn10.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn10.tar.gz), [esc50_cnn14.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn14.tar.gz)
 
 ## Punctuation Restoration Models

diff --git a/paddlespeech/cli/cls/infer.py b/paddlespeech/cli/cls/infer.py
@@ -193,7 +193,8 @@ def preprocess(self, audio_file: Union[str, os.PathLike]):
             sr=feat_conf['sample_rate'],
             mono=True,
             dtype='float32')
-        logger.info("Preprocessing audio_file:" + audio_file)
+        if isinstance(audio_file, (str, os.PathLike)):
+            logger.info("Preprocessing audio_file:" + audio_file)
 
         # Feature extraction
         feature_extractor = LogMelSpectrogram(

diff --git a/paddlespeech/server/__init__.py b/paddlespeech/server/__init__.py
@@ -18,6 +18,7 @@
 from .base_commands import ServerBaseCommand
 from .base_commands import ServerHelpCommand
 from .bin.paddlespeech_client import ASRClientExecutor
+from .bin.paddlespeech_client import CLSClientExecutor
 from .bin.paddlespeech_client import TTSClientExecutor
 from .bin.paddlespeech_server import ServerExecutor