[ASR]add squeezeformer model #2755

yeyupiaoling · 2022-12-20T02:08:02Z

PR types

New features

PR changes

Models

Describe

增加Squeezeformer模型语音识别模型，分别增加了squeezeformer.yaml和chunk_squeezeformer.yaml，均存放在examples/aishell/asr1/conf。

通过验证examples/aishell/asr1/run.sh，stage=0到stop_stage=5能成功执行。下面是测试输出，训练轮数比较少，字错误相对较高。

2022-12-20 09:22:05.998 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:371 - Utt: BAC009S0905W0348
2022-12-20 09:22:05.998 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:372 - Ref: 说到今天获胜的原因
2022-12-20 09:22:05.998 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:373 - Hyp: 说到今天货剩的原因
2022-12-20 09:22:05.999 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:374 - One example error rate [cer] = 0.222222
2022-12-20 09:22:05.999 | INFO     | paddlespeech.s2t.exps.u2.model:test:409 - RTF: 0.000045, Error rate [cer] (7012/?) = 0.128256
2022-12-20 09:22:06.134 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:371 - Utt: BAC009S0901W0461
2022-12-20 09:22:06.135 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:372 - Ref: 彭某下班后准备开车离开
2022-12-20 09:22:06.135 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:373 - Hyp: 彭某下班后准备开车离开
2022-12-20 09:22:06.135 | INFO     | paddlespeech.s2t.exps.u2.model:compute_metrics:374 - One example error rate [cer] = 0.000000

在PPASR验证过，结合语言模型，squeezeformer.yaml的字错率为0.04889，chunk_squeezeformer.yaml的字错率为0.04927。

yt605155624 · 2022-12-20T02:20:20Z

please check CodeStyle CI, follow #2325 to format and recheck your code

zh794390558 · 2022-12-20T02:43:19Z

结合语言模型

用的什么语言模型？实验还在训练吗？

yeyupiaoling · 2022-12-20T02:49:45Z

用的什么语言模型？实验还在训练吗？

@zh794390558 这个表格的第二个

https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#language-model-based-on-ngram

paddlespeech/s2t/modules/subsampling.py

zh794390558 · 2022-12-20T02:58:12Z

paddlespeech/s2t/modules/subsampling.py

+            mask_pad: paddle.Tensor=paddle.ones((0, 0, 0),
+                                                dtype=paddle.bool), ):
+        xs = xs.transpose([0, 2, 1])  # [B, C, T]
+        xs = masked_fill(xs, mask_pad.equal(0), 0.0)


mask_pad中pad是0吗？

可以加上input的shape信息吗？

paddlespeech/s2t/modules/subsampling.py

zh794390558 · 2022-12-20T03:05:51Z

用的什么语言模型？实验还在训练吗？

@zh794390558 这个表格的第二个

https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md#language-model-based-on-ngram

不带lm的结果是多少？

zh794390558 · 2022-12-20T03:06:49Z

paddlespeech/s2t/modules/subsampling.py

+    def __init__(self,
+                 channel: int,
+                 out_dim: int,
+                 kernel_size: int=1,


默认值是否错误？

zh794390558 · 2022-12-20T03:07:35Z

paddlespeech/s2t/modules/subsampling.py

+        self.pw_conv._bias_attr = paddle.nn.initializer.Uniform(
+            low=-pw_max, high=pw_max)
+
+    def forward(


注明各参数的shape信息吧。

zh794390558 · 2022-12-20T03:08:41Z

paddlespeech/s2t/modules/subsampling.py

+        B, T, D = xs.shape
+        mask = mask[:, ::self.stride, ::self.stride]
+        mask_pad = mask_pad[:, :, ::self.stride]
+        L = mask_pad.shape[-1]


这里的L-T应该等于0？会有其他的情况吗？

yeyupiaoling · 2022-12-20T03:09:44Z

不带lm的结果是多少？

这个不知道，我评估都是使用beam search + lm 的。等有空闲的计算力，我可以直接在用PaddleSpeech跑一次看看。

zh794390558 · 2022-12-20T03:17:15Z

paddlespeech/s2t/modules/positionwise_feed_forward.py

+                 dropout_rate: float,
+                 activation: paddle.nn.Layer=paddle.nn.ReLU(),
+                 adaptive_scale: bool=False,
+                 init_weights: bool=False):


相比PositionwiseFeedForward应该是多两个参数，可以试着合并成一个类。
合并需要看看是否影响原有的配置文件，不影响的话合并会好些。

我怕影响到你们的代码，不敢轻易修改，怕出现多米诺骨牌效应。

没事儿，可以修改的。合入前可以多测试下。控制好默认参数，和参数初始化部分，应该就不用修改原有的配置文件了。

zh794390558 · 2022-12-20T03:17:55Z

paddlespeech/s2t/modules/positionwise_feed_forward.py

+        self.dropout = paddle.nn.Dropout(dropout_rate)
+        self.w_2 = Linear(hidden_units, idim)
+        self.adaptive_scale = adaptive_scale
+        ada_scale = self.create_parameter(


可以用self.adaptive_scale控制参数的创建。

zh794390558 · 2022-12-20T03:25:50Z

paddlespeech/s2t/modules/subsampling.py

+            groups=1, )
+
+        self.init_weights()
+


都加上 self.subsampling_rate 和 self.right_context 的参数？

zh794390558 · 2022-12-20T03:37:37Z

paddlespeech/s2t/modules/encoder.py

+    ) -> Tuple[paddle.Tensor, paddle.Tensor]:
+        """Embed positions in tensor.
+        Args:
+            xs: padded input tensor (B, L, D)


zh794390558 · 2022-12-20T03:54:07Z

paddlespeech/s2t/modules/convolution.py

+from paddle.nn import initializer as I
+from typeguard import check_argument_types
+
+__all__ = ['ConvolutionModule2']


挪到 conformer_convolution.py 文件中吧。

zh794390558 · 2022-12-20T03:55:26Z

paddlespeech/s2t/modules/convolution.py

+                 norm: str="batch_norm",
+                 causal: bool=False,
+                 bias: bool=True,
+                 adaptive_scale: bool=False,


应该相比 ConvolutionModule多了adaptive_scale和 init_weights，看下是否可以合并成一个类。是否需要修改原有的配置文件，不影响的话建议合成一个类。

paddlespeech/s2t/modules/convolution.py

paddlespeech/s2t/modules/conv2d.py

zh794390558 · 2022-12-20T04:02:08Z

paddlespeech/s2t/modules/attention.py

+                 n_head,
+                 n_feat,
+                 dropout_rate,
+                 do_rel_shift=False,


应该是多了后三个参数，默认都是false。可以和其他的一样看是否能合并成一个类。

paddlespeech/s2t/modules/attention.py

paddlespeech/s2t/modules/encoder.py

paddlespeech/s2t/modules/positionwise_feed_forward.py

zh794390558

目前有训练的结果吗？

zh794390558 · 2023-01-11T12:16:01Z

examples/aishell/asr1/conf/chunk_squeezeformer.yaml

@@ -21,7 +21,6 @@ encoder_conf:
    normalize_before: false
    activation_type: 'swish'
    pos_enc_layer_type: 'rel_pos'
-    do_rel_shift: false


建议 do_rel_shift 保留吧。

zh794390558 · 2023-01-31T07:32:13Z

我们验证后代码会合入。

zxcd · 2023-03-15T08:52:48Z

aishell集上验证结果如下：
attention: 0.079395
ctc_greedy_search: 0.091484
ctc_prefix_beam_search: 0.091522
attention_rescoring: 0.073908

zh794390558

LGTM

yeyupiaoling · 2023-03-15T10:57:09Z

比conformer低，应该是哪里写错了。纠结~

* add squeezeformer model * change CodeStyle, test=asr * change CodeStyle, test=asr * fix subsample rate error, test=asr * merge classes as required, test=asr * change CodeStyle, test=asr * fix missing code, test=asr * split code to new file, test=asr * remove rel_shift, test=asr

* [TTS]add Diffsinger with opencpop dataset (#3005) * Update requirements.txt * fix vits reduce_sum's input/output dtype, test=tts (#3028) * [TTS] add opencpop PWGAN example (#3031) * add opencpop voc, test=tts * soft link * Update textnorm_test_cases.txt * [TTS] add opencpop HIFIGAN example (#3038) * add opencpop voc, test=tts * soft link * add opencpop hifigan, test=tts * update * fix dtype diff of last expand_v2 op of VITS (#3041) * [ASR]add squeezeformer model (#2755) * add squeezeformer model * change CodeStyle, test=asr * change CodeStyle, test=asr * fix subsample rate error, test=asr * merge classes as required, test=asr * change CodeStyle, test=asr * fix missing code, test=asr * split code to new file, test=asr * remove rel_shift, test=asr * Update README.md * Update README_cn.md * Update README.md * Update README_cn.md * Update README.md * fix input dtype of elementwise_mul op from bool to int64 (#3054) * [TTS] add svs frontend (#3062) * [TTS]clean starganv2 vc model code and add docstring (#2987) * clean code * add docstring * [Doc] change define asr server config to chunk asr config, test=doc (#3067) * Update README.md * Update README_cn.md * get music score, test=doc (#3070) * [TTS]fix elementwise_floordiv's fill_constant (#3075) * fix elementwise_floordiv's fill_constant * add float converter for min_value in attention * fix paddle2onnx's install version, install the newest paddle2onnx in run.sh (#3084) * [TTS] update svs_music_score.md (#3085) * rm unused dep, test=tts (#3097) * Update bug-report-tts.md (#3120) * [TTS]Fix VITS lite infer (#3098) * [TTS]add starganv2 vc trainer (#3143) * add starganv2 vc trainer * fix StarGANv2VCUpdater and losses * fix StarGANv2VCEvaluator * add some typehint * [TTS]【Hackathon + No.190】 + 模型复现：iSTFTNet (#3006) * iSTFTNet implementation based on hifigan, not affect the function and execution of HIFIGAN * modify the comment in iSTFT.yaml * add the comments in hifigan * iSTFTNet implementation based on hifigan, not affect the function and execution of HIFIGAN * modify the comment in iSTFT.yaml * add the comments in hifigan * add iSTFTNet.md * modify the format of iSTFTNet.md * modify iSTFT.yaml and hifigan.py * Format code using pre-commit * modify hifigan.py,delete the unused self.istft_layer_id , move the self.output_conv behind else, change conv_post to output_conv * update iSTFTNet_csmsc_ckpt.zip download link * modify iSTFTNet.md * modify hifigan.py and iSTFT.yaml * modify iSTFTNet.md * add function for generating srt file (#3123) * add function for generating srt file 在原来websocket_client.py的基础上，增加了由wav或mp3格式的音频文件生成对应srt格式字幕文件的功能 * add function for generating srt file 在原来websocket_client.py的基础上，增加了由wav或mp3格式的音频文件生成对应srt格式字幕文件的功能 * keep origin websocket_client.py 恢复原本的websocket_client.py文件 * add generating subtitle function into README * add generate subtitle funciton into README * add subtitle generation function * add subtitle generation function * fix example/aishell local/train.sh if condition bug, test=asr (#3146) * fix some preprocess bugs (#3155) * add amp for U2 conformer. * fix scaler save * fix scaler save and load. * mv scaler.unscale_ blow grad_clip. * [TTS]add StarGANv2VC preprocess (#3163) * [TTS] [黑客松]Add JETS (#3109) * Update quick_start.md (#3175) * [BUG] Fix progress bar unit. (#3177) * Update quick_start_cn.md (#3176) * [TTS]StarGANv2 VC fix some trainer bugs, add add reset_parameters (#3182) * VITS learning rate revised, test=tts * VITS learning rate revised, test=tts * [s2t] mv dataset into paddlespeech.dataset (#3183) * mv dataset into paddlespeech.dataset * add aidatatang * fix import * Fix some typos. (#3178) * [s2t] move s2t data preprocess into paddlespeech.dataset (#3189) * move s2t data preprocess into paddlespeech.dataset * avg model, compute wer, format rsl into paddlespeech.dataset * fix format rsl * fix avg ckpts * Update pretrained model in README (#3193) * [TTS]Fix losses of StarGAN v2 VC (#3184) * VITS learning rate revised, test=tts * VITS learning rate revised, test=tts * add new aishell model for better CER. * add readme * [s2t] fix cli args to config (#3194) * fix cli args to config * fix train cli * Update README.md * [ASR] Support Hubert, fintuned on the librispeech dataset (#3088) * librispeech hubert, test=asr * librispeech hubert, test=asr * hubert decode * review * copyright, notes, example related * hubert cli * pre-commit format * fix conflicts * fix conflicts * doc related * doc and train config * librispeech.py * support hubert cli * [ASR] fix asr 0-d tensor. (#3214) * Update README.md * Update README.md * fix: 🐛 修复服务端 python ASREngine 无法使用conformer_talcs模型 (#3230) * fix: 🐛 fix python ASREngine not pass codeswitch * docs: 📝 Update Docs * 修改模型判断方式 * Adding WavLM implementation * fix model m5s * Code clean up according to comments in #3242 * fix error in tts/st * Changed the path for the uploaded weight * Update phonecode.py # 固话的正则错误修改参考https://github.com/speechio/chinese_text_normalization/blob/master/python/cn_tn.py 固化的正则为： pattern = re.compile(r"\D((0(10|2[1-3]|[3-9]\d{2})-?)?[1-9]\d{6,7})\D") * Adapted wavlmASR model to pretrained weights and CLI * Changed the MD5 of the pretrained tar file due to bug fixes * Deleted examples/librispeech/asr5/format_rsl.py * Update released_model.md * Code clean up for CIs * Fixed the transpose usages ignored before * Update setup.py * refactor mfa scripts * Final cleaning; Modified SSL/infer.py and README for wavlm inclusion in model options * updating readme and readme_cn * remove tsinghua pypi * Update setup.py (#3294) * Update setup.py * refactor rhy * fix ckpt * add dtype param for arange API. (#3302) * add scripts for tts code switch * add t2s assets * more comment on tts frontend * fix librosa==0.8.1 numpy==1.23.5 for paddleaudio align with this version * move ssl into t2s.frontend; fix spk_id for 0-D tensor; * add ssml unit test * add en_frontend file * add mix frontend test * fix long text oom using ssml; filter comma; update polyphonic * remove print * hotfix english G2P * en frontend unit text * fix profiler (#3323) * old grad clip has 0d tensor problem, fix it (#3334) * update to py3.8 * remove fluid. * add roformer * fix bugs * add roformer result * support position interpolation for langer attention context windown length. * RoPE with position interpolation * rope for streaming decoding * update result * fix rotary embeding * Update README.md * fix weight decay * fix develop view confict with model's * Add XPU support for SpeedySpeech (#3502) * Add XPU support for SpeedySpeech * fix typos * update description of nxpu * Add XPU support for FastSpeech2 (#3514) * Add XPU support for FastSpeech2 * optimize * Update ge2e_clone.py (#3517) 修复在windows上的多空格错误 * Fix Readme. (#3527) * Update README.md * Update README_cn.md * Update README_cn.md * Update README.md * FIX: Added missing imports * FIX: Fixed the implementation of a special method * 【benchmark】add max_mem_reserved for benchmark (#3604) * fix profiler * add max_mem_reserved for benchmark * fix develop bug function:view to reshape (#3633) * 【benchmark】fix gpu_mem unit (#3634) * fix profiler * add max_mem_reserved for benchmark * fix benchmark * 增加文件编码读取 (#3606) Fixed #3605 * bugfix: audio_len should be 1D, no 0D, which will raise list index out (#3490) of range error in the following decode process Co-authored-by: Luzhenhui <[email protected]> * Update README.md (#3532) Fixed a typo * fixed version for paddlepaddle. (#3701) * fixed version for paddlepaddle. * fix code style * 【Fix Speech Issue No.5】issue 3444 transformation import error (#3779) * fix paddlespeech.s2t.transform.transformation import error * fix paddlespeech.s2t.transform import error * 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug (#3786) * 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug * 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug * 【test】add cli test readme (#3784) * add cli test readme * fix code style * 【test】fix test cli bug (#3793) * add cli test readme * fix code style * fix bug * Update setup.py (#3795) * adapt view behavior change, fix KeyError. (#3794) * adapt view behavior change, fix KeyError. * fix readme demo run error. * fixed opencc version --------- Co-authored-by: liangym <[email protected]> Co-authored-by: TianYuan <[email protected]> Co-authored-by: 夜雨飘零 <[email protected]> Co-authored-by: zxcd <[email protected]> Co-authored-by: longRookie <[email protected]> Co-authored-by: twoDogy <[email protected]> Co-authored-by: lemondy <[email protected]> Co-authored-by: ljhzxc <[email protected]> Co-authored-by: PiaoYang <[email protected]> Co-authored-by: WongLaw <[email protected]> Co-authored-by: Hui Zhang <[email protected]> Co-authored-by: Shuangchi He <[email protected]> Co-authored-by: TianHao Zhang <[email protected]> Co-authored-by: guanyc <[email protected]> Co-authored-by: jiamingkong <[email protected]> Co-authored-by: zoooo0820 <[email protected]> Co-authored-by: shuishu <[email protected]> Co-authored-by: LixinGuo <[email protected]> Co-authored-by: gmm <[email protected]> Co-authored-by: Wang Huan <[email protected]> Co-authored-by: Kai Song <[email protected]> Co-authored-by: skyboooox <[email protected]> Co-authored-by: fazledyn-or <[email protected]> Co-authored-by: luyao-cv <[email protected]> Co-authored-by: Color_yr <[email protected]> Co-authored-by: JeffLu <[email protected]> Co-authored-by: Luzhenhui <[email protected]> Co-authored-by: satani99 <[email protected]> Co-authored-by: mjxs <[email protected]> Co-authored-by: Mattheliu <[email protected]>

add squeezeformer model

6d867f7

mergify bot added S2T asr/st Example labels Dec 20, 2022

yt605155624 assigned yeyupiaoling Dec 20, 2022

yt605155624 added the contributor label Dec 20, 2022

yt605155624 added this to the r1.4.0 milestone Dec 20, 2022

yt605155624 changed the title ~~add squeezeformer model~~ [ASR]add squeezeformer model Dec 20, 2022

change CodeStyle, test=asr

2aa8457

change CodeStyle, test=asr

34acf5f

zh794390558 reviewed Dec 20, 2022

View reviewed changes

paddlespeech/s2t/modules/subsampling.py Show resolved Hide resolved

zh794390558 reviewed Dec 20, 2022

View reviewed changes

fix subsample rate error, test=asr

1c156bf

zh794390558 reviewed Dec 20, 2022

View reviewed changes

paddlespeech/s2t/modules/subsampling.py Outdated Show resolved Hide resolved

zh794390558 reviewed Dec 20, 2022

View reviewed changes

yeyupiaoling added 2 commits January 4, 2023 16:37

merge classes as required, test=asr

c1df5b7

change CodeStyle, test=asr

ccc1571

zh794390558 reviewed Jan 4, 2023

View reviewed changes

paddlespeech/s2t/modules/attention.py Outdated Show resolved Hide resolved

paddlespeech/s2t/modules/encoder.py Outdated Show resolved Hide resolved

paddlespeech/s2t/modules/positionwise_feed_forward.py Outdated Show resolved Hide resolved

yeyupiaoling added 2 commits January 4, 2023 17:12

fix missing code, test=asr

7b1519b

split code to new file, test=asr

b297635

remove rel_shift, test=asr

fe8bbcc

zh794390558 reviewed Jan 11, 2023

View reviewed changes

zh794390558 approved these changes Mar 15, 2023

View reviewed changes

zh794390558 merged commit 31a4562 into PaddlePaddle:develop Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASR]add squeezeformer model #2755

[ASR]add squeezeformer model #2755

yeyupiaoling commented Dec 20, 2022

yt605155624 commented Dec 20, 2022

zh794390558 commented Dec 20, 2022

yeyupiaoling commented Dec 20, 2022

zh794390558 Dec 20, 2022 •

edited

Loading

zh794390558 commented Dec 20, 2022

zh794390558 Dec 20, 2022

zh794390558 Dec 20, 2022

zh794390558 Dec 20, 2022

yeyupiaoling commented Dec 20, 2022

zh794390558 Dec 20, 2022 •

edited

Loading

yeyupiaoling Dec 20, 2022

zh794390558 Dec 22, 2022

zh794390558 Dec 20, 2022

zh794390558 Dec 20, 2022 •

edited

Loading

zh794390558 Dec 20, 2022

zh794390558 Dec 20, 2022

zh794390558 Dec 20, 2022

zh794390558 Dec 20, 2022

zh794390558 left a comment

zh794390558 Jan 11, 2023

zh794390558 commented Jan 31, 2023 •

edited

Loading

zxcd commented Mar 15, 2023

zh794390558 left a comment

yeyupiaoling commented Mar 15, 2023

[ASR]add squeezeformer model #2755

[ASR]add squeezeformer model #2755

Conversation

yeyupiaoling commented Dec 20, 2022

PR types

PR changes

Describe

yt605155624 commented Dec 20, 2022

zh794390558 commented Dec 20, 2022

yeyupiaoling commented Dec 20, 2022

zh794390558 Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

zh794390558 commented Dec 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeyupiaoling commented Dec 20, 2022

zh794390558 Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zh794390558 Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zh794390558 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zh794390558 commented Jan 31, 2023 • edited Loading

zxcd commented Mar 15, 2023

zh794390558 left a comment

Choose a reason for hiding this comment

yeyupiaoling commented Mar 15, 2023

zh794390558 Dec 20, 2022 •

edited

Loading

zh794390558 Dec 20, 2022 •

edited

Loading

zh794390558 Dec 20, 2022 •

edited

Loading

zh794390558 commented Jan 31, 2023 •

edited

Loading