Skip to content

Conversation

@vimalmanohar
Copy link
Contributor

A simple version of semi-supervised training using lattice-free MMI on subset of Fisher English.

Moved from vimalmanohar#14.

hhadian and others added 30 commits June 2, 2017 22:42
…vised

Travis was failing to compile(not sure why)-- I used the "Update Branch" button
Conflicts:
	src/chain/chain-denominator-smbr.cc
Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized I had some pending comments on this that I had not submitted.
There is a conflict too.

nj=15 # This should be set to the maximum number of jobs you are
# comfortable to run in parallel; you can increase it if your disk
# speed is greater and you have more machines.
max_jobs_run=15 # This should be set to the maximum number of jobs you are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're ever using the --nj option, fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change jobs -> nnet3-chain-get-egs jobs.

@@ -0,0 +1,21 @@
#!/bin/bash

# Copyright 2017 Vimal Manohar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reference to this script or any other script, in the run.sh.
If you don't put a commented-out reference to this in the run.sh, it's not obvious in which order things should be called. This needs to be made much clearer than it is now.

If it's more than just a couple of lines, you could introduce an intermediate script, something like local/semisup/run_semisupervised_50k.sh and local/semisup/run_semisupervised_100k.sh, which you'd invoke in a comment from run.sh. But these scripts shouldn't have any variables; they should just be lists of concrete invocations of other scripts (like local/chain/tuning/run_tdnn_1a.sh and local/semisup/blah/blah) with concrete arguments. I don't want people to think of it as anything more than a piece of documentation saying in what order to call things.

# The output directory has the format of an alignment directory.
# It can optionally read alignments from a directory, in which case,
# the script gets frame-level posteriors of the pdf corresponding to those
# alignments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make clear that the weights are output as weights.scp

# LM for decoding unsupervised data: 4gram
# Supervision: Naive split lattices

# train_set train_sup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment explaining which is output-0 vs output-1; same if similar things appear elsewhere.

supervised_set=train_sup
unsupervised_set=train_unsup100k_250k

sup_chain_dir=exp/semisup_100k/chain/tdnn_1a_sp # supervised chain system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if you could explain in a comment which of these are inputs and which are outputs.

@@ -0,0 +1,201 @@
#!/bin/bash

# Copyright 2017 Vimal Manohar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a reference to this and run_100k.sh in the run.sh, commented out, showing at what point to run them.
And in the scripts in local/ that this calls, I think there should be a note that local/semisup/run_50k.sh shows how to call this. (Same for 100k). This was not very discoverable to me.

# which is different from run_50k.sh, which uses combined supervised +
# unsupervised set.

. ./cmd.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this script is very simple and linear and configuration-free, but I think adding a --stage option would be helpful to users.

@vimalmanohar
Copy link
Contributor Author

I made the changes.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're making progress.
Some more comments after looking through the code a bit more carefully.
I'm asking you to merge Hossein's PR, solve the integration issues, and re-test; that will kill two birds with one stone.

# This script rescores non-compact, (possibly) undeterminized lattices with the
# ConstArpaLm format language model.
# This is similar to steps/lmrescore_const_arpa.sh, but expects
# non-compact lattices as input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this text:
If you use the option "--write compact false" it outputs non-compact lattices; the purpose is to add
in LM scores while leaving the frame-by-frame acoustic scores in the same position that they were in
in the input, undeterminized lattices. This is important in our 'chain' semi-supervised training recipes,
where it helps us to split lattices while keeping the scores at the edges of the split points correct.

this_frame_subsampling_factor=$(cat $this_alidir/frame_subsampling_factor)
fi

if (( $frame_subsampling_factor % $this_frame_subsampling_factor != 0 )); then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested a construct like this, it doesn't work because 0 and 1 are != "true" or "false".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that it works. Double parenthesis returns true or false.


if [ $stage -le -1 ]; then
# Convert the alignments to the new tree. Note: we likely will not use these
# converted alignments in the CTC system directly, but they could be useful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change CTC->chain (more lava flow).

nj=15 # This should be set to the maximum number of jobs you are
# comfortable to run in parallel; you can increase it if your disk
# speed is greater and you have more machines.
max_jobs_run=15 # This should be set to the maximum number of jobs you are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change jobs -> nnet3-chain-get-egs jobs.

# it doesn't make sense to use different options than were used as input to the
# LDA transform). This is used to turn off CMVN in the online-nnet experiments.
lattice_lm_scale= # If supplied, the graph/lm weight of the lattices will be
# used (with this scale) in generating supervisions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify that this would normally be 0 for conventional supervised training, but may be close to 1
for the unsupervised part of the data in semi-supervised training

///
/// @param [in] lat Input lattice. Expected to be top-sorted. Otherwise the
/// function will crash.
/// @param [out] acoustic_scores Pointer to a map where the mapping from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation doesn't seem to be consistent with the function signature: you say it is a map
to acoustic score, but it returns a pair.

/// ComputeAcousticScoresMap into the lattice.
///
/// @param [in] acoustic_scores A map from the pair (frame-index,
//pdf-id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this. And you say it's a map to acoustic score: why is it a pair?

po.Register("project-input", &project_input,
"Project to input labels (transition-ids); applicable only "
"when --read-compact=false");
po.Register("project-output", &project_output,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean word-ids. But I don't think this option makes sense. The lattice would be mostly epsilons. Can you remove it if it's not necessary.
Is project-input needed? Please simplify to only what you need for this PR.

fst::VectorFst<StdArc> fst;
{
Lattice lat;
ConvertLattice(clat, &lat); // convert to non-compact form.. won't introduce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that these multi-line comments have different indentation on different lines.

stats_.PrintStats();
}

void ScaleFst(BaseFloat scale, fst::StdVectorFst *fst) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function that does this already exists somewhere in fstext-utils.h. You can remove this from the header too.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small comments.

# Final train prob (xent) -1.9246 -1.5926 -1.6454
# Final valid prob (xent) -2.1873 -1.7990 -1.7107

# train_set semisup15k_100k_250k semisup50k_100k_250k semisup100k_250k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of these results still part of the scripts? remove any that are not.
If these lines came from compare_wer.sh it would be nice if you could show the corresponding command line.
and an explanation of the naming convention would be nice too.... in semisupX_Y_Z, it's not clear what X, Y and Z are.


echo "$0: generating egs from the supervised data"
steps/nnet3/chain/get_egs.sh --cmd "$decode_cmd" \
--left-context $egs_left_context --right-context $egs_right_context \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this indentation level

# Unsupervised weight: 1.0
# Weights for phone LM (supervised, unsupervised): 3,2
# LM for decoding unsupervised data: 4gram
# Supervision: Naive split lattices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accurate, that you are using naive splitting? You seem to be also using it for the other example scripts; but the paper seems to say that "smart splitting" is generally better. Can you clarify which options control the type of splitting?

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Mar 5, 2018 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 5, 2018 via email

@danpovey
Copy link
Contributor

@vimalmanohar, sorry, there are conflicts now. Please resolve and once you confirm it's good to merge I'll merge.

@danpovey
Copy link
Contributor

@vimalmanohar, don't forget about this.

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Mar 20, 2018 via email

@vimalmanohar
Copy link
Contributor Author

I fixed all issues and conflicts.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed some small things...

// Compute a map from each (t, tid) to (sum_of_acoustic_scores, count)
unordered_map<std::pair<int32,int32>, std::pair<BaseFloat, int32>,
PairHasher<int32> > acoustic_scores;
if (!write_compact)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this acoustic-scores-map thing is necessary here because composition with an FST that only has weights on the graph side of the scores, will leave the acoustic scores where they were.

this_deriv_weights(i) = (*deriv_weights)(t);
}
KALDI_ASSERT(output_weights.Dim() == num_frames_subsampled);
this_deriv_weights.MulElements(output_weights);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't let this issue hold up merging this PR, but you might want to measure the effect of, in the case where 'deriv_weights' are supplied, just ignoring the 'output_weights' here instead of multiplying them by them. Based on my previous experience, this would improve the results. Don't add options though-- too much complexity

"and input frames.");
po.Register("deriv-weights-rspecifier", &deriv_weights_rspecifier,
"Per-frame weights that scales a frame's gradient during "
"backpropagation."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a space here after the ".".

}


void ComputeAcousticScoresMap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it turns out you don't need this code after looking into the composition, you can remove it.

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Mar 28, 2018 via email

@danpovey danpovey merged commit 191b39a into kaldi-asr:master Mar 28, 2018
danpovey added a commit that referenced this pull request Apr 5, 2018
LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018
…sr#2140)

Conflicts:
	egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py
	egs/wsj/s5/steps/nnet3/chain/train.py
LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018
…aldi-asr#2140; un-support --transform-dir. Thx: @aaror8 (kaldi-asr#2334)

Conflicts:
	egs/wsj/s5/steps/nnet3/get_egs.sh
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants