Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASR] fix asr 0-d tensor. #3214

Merged
merged 2 commits into from
May 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

### Recent Update
- ⚡ 2023.04.28: Fix [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
- 👑 2023.04.25: Add [AMP for U2 conformer](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
Expand Down
1 change: 1 addition & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。

### 近期更新
- ⚡ 2023.04.28: 修正 [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), 配合PaddlePaddle2.5升级修改了0-d tensor的问题。
- 👑 2023.04.25: 新增 [U2 conformer 的 AMP 训练](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 👑 2023.04.06: 新增 [srt格式字幕生成功能](./demos/streaming_asr_server)。
- 🔥 2023.03.14: 新增基于 Opencpop 数据集的 SVS (歌唱合成) 示例,包含 [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) 和 [HiFiGAN](./examples/opencpop/voc5),效果持续优化中。
Expand Down
8 changes: 4 additions & 4 deletions paddlespeech/cli/asr/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,6 @@
import numpy as np
import paddle
import soundfile
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig
from yacs.config import CfgNode

from ...utils.env import MODEL_HOME
Expand All @@ -37,6 +34,9 @@
from ..utils import CLI_TIMER
from ..utils import stats_wrapper
from ..utils import timer_register
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig

__all__ = ['ASRExecutor']

Expand Down Expand Up @@ -274,7 +274,7 @@ def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
# fbank
audio = preprocessing(audio, **preprocess_args)

audio_len = paddle.to_tensor(audio.shape[0])
audio_len = paddle.to_tensor(audio.shape[0]).unsqueeze(axis=0)
audio = paddle.to_tensor(audio, dtype='float32').unsqueeze(axis=0)

self._inputs["audio"] = audio
Expand Down
2 changes: 1 addition & 1 deletion paddlespeech/cli/ssl/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
# fbank
audio = preprocessing(audio, **preprocess_args)

audio_len = paddle.to_tensor(audio.shape[0])
audio_len = paddle.to_tensor(audio.shape[0]).unsqueeze(axis=0)
audio = paddle.to_tensor(audio, dtype='float32').unsqueeze(axis=0)

self._inputs["audio"] = audio
Expand Down
2 changes: 1 addition & 1 deletion paddlespeech/cli/whisper/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
# fbank
audio = log_mel_spectrogram(audio, resource_path=self.resource_path)

audio_len = paddle.to_tensor(audio.shape[0])
audio_len = paddle.to_tensor(audio.shape[0]).unsqueeze(axis=0)

self._inputs["audio"] = audio
self._inputs["audio_len"] = audio_len
Expand Down