Skip to content

Conversation

@vimalmanohar
Copy link
Owner

No description provided.

vimalmanohar and others added 30 commits March 24, 2016 21:37
…sfer-learning-wsj-rm

Conflicts:
	egs/wsj/s5/steps/nnet3/xconfig_to_configs.py
…vised

Travis was failing to compile(not sure why)-- I used the "Update Branch" button
@vimalmanohar
Copy link
Owner Author

@hhadian Could you please review this PR?

@hhadian
Copy link

hhadian commented Dec 7, 2017

Sure, I will do it.

# Copyright 2017 Vimal Manohar
# Apache 2.0

# This is fisher chain recipe for training a model on a subset of around 100 hours.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this script use 100hrs supervised training data? add better description e.g. this script uses 100hrs supervised data

exp=exp/semisup_100k
gmm=tri4a
xent_regularize=0.1
hidden_dim=725
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it large? we use 625 for 300hrs swbd data.

num_epochs=4
remove_egs=false
common_egs_dir=
minibatch_size=128
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it in the script.

set -e

# This is an oracle experiment using oracle transcription of 250 hours of
# unsupervised data, along with 100 hours of supervised data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can easily use run_tdnn_100k_a.sh with new combined dataset, I am not sure why do you need two separate scripts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. In the inital PR, there was only one TDNN recipe which was
called with different training data sets during semi-supervised training.
Separating the scripts might be clearer but it will add too many very similar scripts.

exp=exp/semisup_15k
gmm=tri3
xent_regularize=0.1
hidden_dim=500
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to reduce it to smaller size or reducing number of layers?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I tried smaller sizes but it did not help much.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Fisher dev and test are more similar to training compared to eval2000 and swbd. So it might be ok for a larger network.


# Semi-supervised options
comb_affix=comb1am # affix for new chain-model directory trained on the combined supervised+unsupervised subsets
supervision_weights=1.0,1.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add description

# Neural network opts
apply_deriv_weights=true
xent_regularize=0.1
hidden_dim=725
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to tune it? my guess is that it is large hidden dim.

apply_deriv_weights=true
xent_regularize=0.1
hidden_dim=725
minibatch_size="150=128/300=64"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better than using minibatch_size=150,300?

relu-batchnorm-layer name=prefinal-xent input=tdnn6 dim=$hidden_dim target-rms=0.5
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5

output name=output-0 input=output.affine skip-in-init=true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have two separate output nodes? Do you use weighted training? supervision weights were the same in the script.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is skip-in-init option? You can add a single line comment about output in config file.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip-in-init is added to prevent output line (trivial output layer) from being printed in init.config.
Do you know why the trivial output layers are needed in init.config?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just use init.config for training lda matrix and we don't need to add other outputs. Probably, we can modify the xconfig to not print trivial outputs in init.config (we just need to print output-node name=output).

--lattice-prune-beam "$lattice_prune_beam" \
--phone-insertion-penalty "$phone_insertion_penalty" \
--deriv-weights-scp $chaindir/best_path_${unsupervised_set}${decode_affix}/weights.scp \
--online-ivector-dir $exp/nnet3${nnet3_affix}/ivectors_${semisup_train_set}_sp_hires \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use supervised data for ivector training?

Copy link

@hhadian hhadian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that this PR could be wrapped up in 20-30 changed files (new or modified)
Not sure but I feel 70 changed files is too many.


train_lm.sh --arpa --lmtype 3gram-mincount $dir || exit 1;

train_lm.sh --arpa --lmtype 4gram-mincount $dir || exit 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using pocolm, it might be better to leave this script unchanged (to make the PR smaller)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense.

set -e

# This is an oracle experiment using oracle transcription of 250 hours of
# unsupervised data, along with 100 hours of supervised data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. In the inital PR, there was only one TDNN recipe which was
called with different training data sets during semi-supervised training.
Separating the scripts might be clearer but it will add too many very similar scripts.

exp=exp/semisup_15k
gmm=tri3
xent_regularize=0.1
hidden_dim=500
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I tried smaller sizes but it did not help much.

@@ -0,0 +1,201 @@
#!/bin/bash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be nicer to keep only one version of everything (even though it's in tuning) so no _i, _a, etc. That's because there are too many files in this PR.


std::string wav_rspecifier = po.GetArg(1);
std::string wav_wspecifier = po.GetArg(2);
if (ClassifyRspecifier(po.GetArg(1), NULL, NULL) != kNoRspecifier) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was for perturb_to_allowed_lengths.py. I guess you are not using
that script (i.e. non-split training) so it might be better to leave this file unchanged.

if (token == "<DW>")
ReadVectorAsChar(is, binary, &deriv_weights);
else
deriv_weights.Read(is, binary);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is <DW2> diffrent from <DW>?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DW reads only 0 and 1. DW2 reads and writes as float, which is needed for this.

@hhadian
Copy link

hhadian commented Dec 8, 2017

BTW, I noticed your master is not up-to-date with upstream kaldi.
Some of the changes in this PR might be already in kaldi master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants