Noisy student training for wenet #1600

NevermoreCY · 2022-12-05T10:35:54Z

Here, we provide a recipe to run Noisy Student Training (NST) with LM filter strategy using AISHELL-1 as supervised data and WenetSpeech as unsupervised data from our paper.

The example codes are stored under examples/aishell/NST with a detailed guideline and results in readme.md. we mainly modified the script for "run.sh" and added "executor_nst.py" ,"train_nst.py" as well as some auxiliary codes and examples under local directory.

wd929 · 2022-12-05T10:43:36Z

@robin1001 hi Binbin, would you mind reviewing this PR? Thanks!!

robin1001 · 2022-12-07T03:15:58Z

wenet/bin/train_nst.py

+    print("unsupervised data list = ", args.train_data_unsupervised)
+
+    cv_conf = copy.deepcopy(train_supervised_conf)
+    # cv_conf['speed_perturb'] = False


why use speed_perturb and spec_aug in cv?

Hi robin, thanks for pointing out this issue, the speed_perturb, spech_aug as well as shuffle should be set to false for CV. I traced back my history and found it was a mis-comment during my recent review. In our experiment cluster, it was set to be False.

NevermoreCY · 2022-12-07T05:12:21Z

The flake8-bugbear is updated to version 22.12.6 from 22.10.27 and they modified the test for zip function.
It seems codes contains zip() from other section didn't pass the flake8 test.

xingchensong · 2022-12-07T06:47:08Z

The flake8-bugbear is updated to version 22.12.6 from 22.10.27 and they modified the test for zip function. It seems codes contains zip() from other section didn't pass the flake8 test.

fixed, #1604, plz rebase your commits

NevermoreCY · 2022-12-07T07:21:10Z

It works now, thanks!

robin1001 · 2022-12-08T02:34:40Z

wenet/bin/train_nst.py

+                                                    pin_memory=args.pin_memory,
+                                                    num_workers=args.num_workers,
+                                                    prefetch_factor=args.prefetch)
+        executor = Executor_nst()


better to Name it ExecutorNst

xingchensong · 2022-12-08T12:17:02Z

wenet/bin/train_nst.py

+from tensorboardX import SummaryWriter
+from torch.utils.data import DataLoader
+from wenet.dataset.dataset import Dataset
+from wenet.transformer.asr_model import init_asr_model


What's your base commits? we have refactor our model initialization in this PR #1280, call wenet.utils.init_model.init_model instead of wenet.transformer.asr_model.init_asr_model

TODO: modify run_nst.sh , remove train_nst, exector_nst

modifyed README.md

wd929 · 2022-12-13T03:54:52Z

Could you please review this PR? @robin1001 and @xingchensong Thanks!!^^

robin1001

please remove the files which are not required for the recipe.

robin1001 · 2022-12-08T02:39:20Z

wenet/utils/executor_nst.py

+    def __init__(self):
+        self.step = 0
+
+    def train(self, model, optimizer, scheduler, data_loader_aishell,


better to name it
data_loader_supervised and data_loader_unsupervised

robin1001 · 2022-12-08T02:46:16Z

examples/aishell/NST/run_nst.sh

+      --val_best
+  fi
+
+  # export model


export_jit is not required here.

robin1001 · 2022-12-08T03:01:52Z

examples/aishell/NST/local/example/train/data_aishell.list

@@ -0,0 +1,11 @@
+data/train/shards/shards_000000000.tar


这些示例的文件可以删除掉。

robin1001 · 2022-12-13T04:01:41Z

examples/aishell/NST/conf/train_conformer_nst.yaml

@@ -0,0 +1,143 @@
+# network architecture


This file is not required, right?

robin1001 · 2022-12-13T04:02:43Z

examples/aishell/NST/data_example/train/data_aishell.list

@@ -0,0 +1,11 @@
+data/train/shards/shards_000000000.tar


please remove data_example/train dir

robin1001 · 2022-12-13T04:03:07Z

examples/aishell/NST/data_example/train/wav_dir/ID001.txt

@@ -0,0 +1 @@
+你好你好


remove wav_dir

robin1001 · 2022-12-13T04:03:20Z

examples/aishell/NST/data_example/train/wenet_1khr.list

@@ -0,0 +1,2 @@
+data/train/wenet_1khr_tar//dir0_000000.tar
+data/train/wenet_1khr_tar//dir0_000001.tar


remove the file

xingchensong · 2022-12-13T04:31:04Z

examples/aishell/NST/run_nst.sh

+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ] && [ ${enable_nst} -eq 0 ]; then
+  echo "********step 3 start time : $now ********"
+  python split_data_list.py \


python local/split_data_list.py

xingchensong · 2022-12-13T04:36:56Z

Great Job, LGTM.

NevermoreCY · 2022-12-13T05:18:57Z

please remove the files which are not required for the recipe.

The data example directory, train_nst.py and executor.py has been removed, the problems you mentioned should be sovled.

NevermoreCY and others added 12 commits December 5, 2022 15:05

add NST module

220d7b2

Delete test.text

fd2cbeb

Delete text.md

92974fd

Update README

4d68c70

Update README.md

2f84cd7

Update README.md

bd36149

Update README.md

1c4043b

add example toy data

1683016

add few lines in README

0496b1d

add toy wav and txt

7595ac9

add toy data

b3c880c

Update README.md

0b47162

NevermoreCY added 5 commits December 5, 2022 20:04

fixed some formatting issue

aa5bd02

Merge remote-tracking branch 'origin/main'

5663ec0

fixed some formatting issue

b4dfc71

fixed some formatting issue

9a059a2

fixed some formatting issue

145fe9c

robin1001 reviewed Dec 7, 2022

View reviewed changes

fix mis-comment on cv_configs

dc53142

fix some formatting issue

463d09a

NevermoreCY and others added 2 commits December 7, 2022 15:15

fix some formatting issue

974ada1

Merge branch 'wenet-e2e:main' into main

3b6e69b

NevermoreCY added 2 commits December 7, 2022 15:22

commit after rebase

0918842

Merge branch 'main' of https://github.com/NevermoreCY/wenet

2289c79

robin1001 reviewed Dec 8, 2022

View reviewed changes

xingchensong reviewed Dec 8, 2022

View reviewed changes

NevermoreCY and others added 20 commits December 9, 2022 13:56

fixed bugs mentioned by @xingchensong, add data list fusion.

555edae

TODO: modify run_nst.sh , remove train_nst, exector_nst

modify run_nst.sh

a31a7d1

add config

06e58a6

add run.sh

d6751d3

fix bug in generate_data_list.py

7c8a859

fix bug in generate_data_list.py

68d974a

fix bug in generate_data_list.py

91baa96

fix bug in generate_data_list.py

98eced6

fix bug in generate_data_list.py

898ccd0

fix bug in generate_data_list.py

bd341f6

format issue

eb58e5b

format issue

52dcdfe

delete train_nst.py and executor_nst.py

6460e82

modifyed README.md

Merge branch 'wenet-e2e:main' into main

52c4783

formatting

b553f01

Merge remote-tracking branch 'origin/main'

40b7b61

formatting

62e8cf1

fix comments

de8bbe2

Update README.md

0141767

Update README.md

908752c

robin1001 requested changes Dec 13, 2022

View reviewed changes

xingchensong reviewed Dec 13, 2022

View reviewed changes

NevermoreCY added 2 commits December 13, 2022 12:50

delete example data, fix run_nst.sh

7f5ee03

fix readme

4b19bb8

fix config

74caa24

robin1001 approved these changes Dec 13, 2022

View reviewed changes

robin1001 merged commit 9060ab2 into wenet-e2e:main Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noisy student training for wenet #1600

Noisy student training for wenet #1600

NevermoreCY commented Dec 5, 2022

wd929 commented Dec 5, 2022

robin1001 Dec 7, 2022

NevermoreCY Dec 7, 2022

NevermoreCY commented Dec 7, 2022

xingchensong commented Dec 7, 2022

NevermoreCY commented Dec 7, 2022

robin1001 Dec 8, 2022

xingchensong Dec 8, 2022

wd929 commented Dec 13, 2022

robin1001 left a comment

robin1001 Dec 8, 2022

robin1001 Dec 8, 2022

robin1001 Dec 8, 2022

robin1001 Dec 13, 2022

robin1001 Dec 13, 2022

robin1001 Dec 13, 2022

robin1001 Dec 13, 2022

xingchensong Dec 13, 2022

xingchensong commented Dec 13, 2022

NevermoreCY commented Dec 13, 2022

		@@ -0,0 +1,2 @@
		data/train/wenet_1khr_tar//dir0_000000.tar
		data/train/wenet_1khr_tar//dir0_000001.tar

		@@ -0,0 +1 @@
		你好你好

Noisy student training for wenet #1600

Noisy student training for wenet #1600

Conversation

NevermoreCY commented Dec 5, 2022

wd929 commented Dec 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NevermoreCY commented Dec 7, 2022

xingchensong commented Dec 7, 2022

NevermoreCY commented Dec 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wd929 commented Dec 13, 2022

robin1001 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingchensong commented Dec 13, 2022

NevermoreCY commented Dec 13, 2022