-
Notifications
You must be signed in to change notification settings - Fork 1
WIP: Clean version of semi-supervised PR #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sfer-learning-wsj-rm Conflicts: egs/wsj/s5/steps/nnet3/xconfig_to_configs.py
…vised Travis was failing to compile(not sure why)-- I used the "Update Branch" button
|
@hhadian Could you please review this PR? |
|
Sure, I will do it. |
| # Copyright 2017 Vimal Manohar | ||
| # Apache 2.0 | ||
|
|
||
| # This is fisher chain recipe for training a model on a subset of around 100 hours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this script use 100hrs supervised training data? add better description e.g. this script uses 100hrs supervised data
| exp=exp/semisup_100k | ||
| gmm=tri4a | ||
| xent_regularize=0.1 | ||
| hidden_dim=725 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it large? we use 625 for 300hrs swbd data.
| num_epochs=4 | ||
| remove_egs=false | ||
| common_egs_dir= | ||
| minibatch_size=128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it in the script.
| set -e | ||
|
|
||
| # This is an oracle experiment using oracle transcription of 250 hours of | ||
| # unsupervised data, along with 100 hours of supervised data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can easily use run_tdnn_100k_a.sh with new combined dataset, I am not sure why do you need two separate scripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In the inital PR, there was only one TDNN recipe which was
called with different training data sets during semi-supervised training.
Separating the scripts might be clearer but it will add too many very similar scripts.
| exp=exp/semisup_15k | ||
| gmm=tri3 | ||
| xent_regularize=0.1 | ||
| hidden_dim=500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to reduce it to smaller size or reducing number of layers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I tried smaller sizes but it did not help much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Fisher dev and test are more similar to training compared to eval2000 and swbd. So it might be ok for a larger network.
|
|
||
| # Semi-supervised options | ||
| comb_affix=comb1am # affix for new chain-model directory trained on the combined supervised+unsupervised subsets | ||
| supervision_weights=1.0,1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add description
| # Neural network opts | ||
| apply_deriv_weights=true | ||
| xent_regularize=0.1 | ||
| hidden_dim=725 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to tune it? my guess is that it is large hidden dim.
| apply_deriv_weights=true | ||
| xent_regularize=0.1 | ||
| hidden_dim=725 | ||
| minibatch_size="150=128/300=64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better than using minibatch_size=150,300?
| relu-batchnorm-layer name=prefinal-xent input=tdnn6 dim=$hidden_dim target-rms=0.5 | ||
| output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5 | ||
|
|
||
| output name=output-0 input=output.affine skip-in-init=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have two separate output nodes? Do you use weighted training? supervision weights were the same in the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is skip-in-init option? You can add a single line comment about output in config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip-in-init is added to prevent output line (trivial output layer) from being printed in init.config.
Do you know why the trivial output layers are needed in init.config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just use init.config for training lda matrix and we don't need to add other outputs. Probably, we can modify the xconfig to not print trivial outputs in init.config (we just need to print output-node name=output).
| --lattice-prune-beam "$lattice_prune_beam" \ | ||
| --phone-insertion-penalty "$phone_insertion_penalty" \ | ||
| --deriv-weights-scp $chaindir/best_path_${unsupervised_set}${decode_affix}/weights.scp \ | ||
| --online-ivector-dir $exp/nnet3${nnet3_affix}/ivectors_${semisup_train_set}_sp_hires \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use supervised data for ivector training?
hhadian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that this PR could be wrapped up in 20-30 changed files (new or modified)
Not sure but I feel 70 changed files is too many.
|
|
||
| train_lm.sh --arpa --lmtype 3gram-mincount $dir || exit 1; | ||
|
|
||
| train_lm.sh --arpa --lmtype 4gram-mincount $dir || exit 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are using pocolm, it might be better to leave this script unchanged (to make the PR smaller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense.
| set -e | ||
|
|
||
| # This is an oracle experiment using oracle transcription of 250 hours of | ||
| # unsupervised data, along with 100 hours of supervised data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In the inital PR, there was only one TDNN recipe which was
called with different training data sets during semi-supervised training.
Separating the scripts might be clearer but it will add too many very similar scripts.
| exp=exp/semisup_15k | ||
| gmm=tri3 | ||
| xent_regularize=0.1 | ||
| hidden_dim=500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I tried smaller sizes but it did not help much.
| @@ -0,0 +1,201 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be nicer to keep only one version of everything (even though it's in tuning) so no _i, _a, etc. That's because there are too many files in this PR.
|
|
||
| std::string wav_rspecifier = po.GetArg(1); | ||
| std::string wav_wspecifier = po.GetArg(2); | ||
| if (ClassifyRspecifier(po.GetArg(1), NULL, NULL) != kNoRspecifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was for perturb_to_allowed_lengths.py. I guess you are not using
that script (i.e. non-split training) so it might be better to leave this file unchanged.
| if (token == "<DW>") | ||
| ReadVectorAsChar(is, binary, &deriv_weights); | ||
| else | ||
| deriv_weights.Read(is, binary); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is <DW2> diffrent from <DW>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DW reads only 0 and 1. DW2 reads and writes as float, which is needed for this.
|
BTW, I noticed your |
No description provided.