-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noisy student training for wenet #1600
Conversation
@robin1001 hi Binbin, would you mind reviewing this PR? Thanks!! |
wenet/bin/train_nst.py
Outdated
print("unsupervised data list = ", args.train_data_unsupervised) | ||
|
||
cv_conf = copy.deepcopy(train_supervised_conf) | ||
# cv_conf['speed_perturb'] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use speed_perturb and spec_aug in cv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flake8-bugbear is updated to version 22.12.6 from 22.10.27 and they modified the test for zip function. |
fixed, #1604, plz rebase your commits |
It works now, thanks! |
wenet/bin/train_nst.py
Outdated
pin_memory=args.pin_memory, | ||
num_workers=args.num_workers, | ||
prefetch_factor=args.prefetch) | ||
executor = Executor_nst() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to Name it ExecutorNst
wenet/bin/train_nst.py
Outdated
from tensorboardX import SummaryWriter | ||
from torch.utils.data import DataLoader | ||
from wenet.dataset.dataset import Dataset | ||
from wenet.transformer.asr_model import init_asr_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your base commits? we have refactor our model initialization in this PR #1280, call wenet.utils.init_model.init_model
instead of wenet.transformer.asr_model.init_asr_model
TODO: modify run_nst.sh , remove train_nst, exector_nst
modifyed README.md
Could you please review this PR? @robin1001 and @xingchensong Thanks!!^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the files which are not required for the recipe.
wenet/utils/executor_nst.py
Outdated
def __init__(self): | ||
self.step = 0 | ||
|
||
def train(self, model, optimizer, scheduler, data_loader_aishell, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to name it
data_loader_supervised and data_loader_unsupervised
--val_best | ||
fi | ||
|
||
# export model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export_jit is not required here.
@@ -0,0 +1,11 @@ | |||
data/train/shards/shards_000000000.tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些示例的文件可以删除掉。
@@ -0,0 +1,143 @@ | |||
# network architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not required, right?
@@ -0,0 +1,11 @@ | |||
data/train/shards/shards_000000000.tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove data_example/train dir
@@ -0,0 +1 @@ | |||
你好你好 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove wav_dir
@@ -0,0 +1,2 @@ | |||
data/train/wenet_1khr_tar//dir0_000000.tar | |||
data/train/wenet_1khr_tar//dir0_000001.tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the file
examples/aishell/NST/run_nst.sh
Outdated
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ] && [ ${enable_nst} -eq 0 ]; then | ||
echo "********step 3 start time : $now ********" | ||
python split_data_list.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python local/split_data_list.py
Great Job, LGTM. |
The data example directory, train_nst.py and executor.py has been removed, the problems you mentioned should be sovled. |
Here, we provide a recipe to run Noisy Student Training (NST) with LM filter strategy using AISHELL-1 as supervised data and WenetSpeech as unsupervised data from our paper.
The example codes are stored under examples/aishell/NST with a detailed guideline and results in readme.md. we mainly modified the script for "run.sh" and added "executor_nst.py" ,"train_nst.py" as well as some auxiliary codes and examples under local directory.