Semi-supervised training on Fisher English #2140

vimalmanohar · 2018-01-10T15:09:45Z

A simple version of semi-supervised training using lattice-free MMI on subset of Fisher English.

Moved from vimalmanohar#14.

…vised Travis was failing to compile(not sure why)-- I used the "Update Branch" button

Conflicts: src/chain/chain-denominator-smbr.cc

danpovey

I just realized I had some pending comments on this that I had not submitted.
There is a conflict too.

danpovey · 2018-02-24T18:29:59Z

egs/wsj/s5/steps/nnet3/chain/get_egs.sh

-nj=15         # This should be set to the maximum number of jobs you are
-              # comfortable to run in parallel; you can increase it if your disk
-              # speed is greater and you have more machines.
+max_jobs_run=15         # This should be set to the maximum number of jobs you are


if we're ever using the --nj option, fix it.

change jobs -> nnet3-chain-get-egs jobs.

danpovey · 2018-02-24T18:30:48Z

egs/fisher_english/s5/local/run_unk_model.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+
+# Copyright 2017  Vimal Manohar


I don't see a reference to this script or any other script, in the run.sh.
If you don't put a commented-out reference to this in the run.sh, it's not obvious in which order things should be called. This needs to be made much clearer than it is now.

If it's more than just a couple of lines, you could introduce an intermediate script, something like local/semisup/run_semisupervised_50k.sh and local/semisup/run_semisupervised_100k.sh, which you'd invoke in a comment from run.sh. But these scripts shouldn't have any variables; they should just be lists of concrete invocations of other scripts (like local/chain/tuning/run_tdnn_1a.sh and local/semisup/blah/blah) with concrete arguments. I don't want people to think of it as anything more than a piece of documentation saying in what order to call things.

danpovey · 2018-02-24T18:34:29Z

egs/wsj/s5/steps/best_path_weights.sh

+# The output directory has the format of an alignment directory.
+# It can optionally read alignments from a directory, in which case,
+# the script gets frame-level posteriors of the pdf corresponding to those
+# alignments.


make clear that the weights are output as weights.scp

danpovey · 2018-02-24T18:46:03Z

egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_100k_semisupervised_1a.sh

+# LM for decoding unsupervised data: 4gram
+# Supervision: Naive split lattices
+
+# train_set                           train_sup


add comment explaining which is output-0 vs output-1; same if similar things appear elsewhere.

danpovey · 2018-02-24T18:47:24Z

egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_100k_semisupervised_1a.sh

+supervised_set=train_sup
+unsupervised_set=train_unsup100k_250k
+
+sup_chain_dir=exp/semisup_100k/chain/tdnn_1a_sp  # supervised chain system


it would be nice if you could explain in a comment which of these are inputs and which are outputs.

danpovey · 2018-02-24T18:51:22Z

egs/fisher_english/s5/local/semisup/run_50k.sh

@@ -0,0 +1,201 @@
+#!/bin/bash
+
+# Copyright 2017  Vimal Manohar


There needs to be a reference to this and run_100k.sh in the run.sh, commented out, showing at what point to run them.
And in the scripts in local/ that this calls, I think there should be a note that local/semisup/run_50k.sh shows how to call this. (Same for 100k). This was not very discoverable to me.

danpovey · 2018-02-24T18:52:49Z

egs/fisher_english/s5/local/semisup/run_100k.sh

+# which is different from run_50k.sh, which uses combined supervised + 
+# unsupervised set.
+
+. ./cmd.sh


I like that this script is very simple and linear and configuration-free, but I think adding a --stage option would be helpful to users.

vimalmanohar · 2018-03-01T20:56:17Z

I made the changes.

…the last archive

danpovey

We're making progress.
Some more comments after looking through the code a bit more carefully.
I'm asking you to merge Hossein's PR, solve the integration issues, and re-test; that will kill two birds with one stone.

danpovey · 2018-03-03T19:50:43Z

egs/wsj/s5/steps/lmrescore_const_arpa_undeterminized.sh

+# This script rescores non-compact, (possibly) undeterminized lattices with the 
+# ConstArpaLm format language model.
+# This is similar to steps/lmrescore_const_arpa.sh, but expects 
+# non-compact lattices as input.


Please add this text:
If you use the option "--write compact false" it outputs non-compact lattices; the purpose is to add
in LM scores while leaving the frame-by-frame acoustic scores in the same position that they were in
in the input, undeterminized lattices. This is important in our 'chain' semi-supervised training recipes,
where it helps us to split lattices while keeping the scores at the edges of the split points correct.

danpovey · 2018-03-03T19:59:05Z

egs/wsj/s5/steps/nnet3/chain/build_tree_multiple_sources.sh

+      this_frame_subsampling_factor=$(cat $this_alidir/frame_subsampling_factor)
+    fi
+
+    if (( $frame_subsampling_factor % $this_frame_subsampling_factor != 0 )); then


I tested a construct like this, it doesn't work because 0 and 1 are != "true" or "false".

I checked that it works. Double parenthesis returns true or false.

danpovey · 2018-03-03T20:02:32Z

egs/wsj/s5/steps/nnet3/chain/build_tree_multiple_sources.sh

+
+if [ $stage -le -1 ]; then
+  # Convert the alignments to the new tree.  Note: we likely will not use these
+  # converted alignments in the CTC system directly, but they could be useful


change CTC->chain (more lava flow).

danpovey · 2018-03-03T20:04:51Z

egs/wsj/s5/steps/nnet3/chain/get_egs.sh

-nj=15         # This should be set to the maximum number of jobs you are
-              # comfortable to run in parallel; you can increase it if your disk
-              # speed is greater and you have more machines.
+max_jobs_run=15         # This should be set to the maximum number of jobs you are


change jobs -> nnet3-chain-get-egs jobs.

danpovey · 2018-03-03T20:05:40Z

egs/wsj/s5/steps/nnet3/chain/get_egs.sh

            # it doesn't make sense to use different options than were used as input to the
            # LDA transform).  This is used to turn off CMVN in the online-nnet experiments.
+lattice_lm_scale=     # If supplied, the graph/lm weight of the lattices will be
+                      # used (with this scale) in generating supervisions


specify that this would normally be 0 for conventional supervised training, but may be close to 1
for the unsupervised part of the data in semi-supervised training

danpovey · 2018-03-03T21:13:22Z

src/lat/lattice-functions.h

+///
+///   @param [in] lat   Input lattice. Expected to be top-sorted. Otherwise the 
+///                     function will crash. 
+///   @param [out] acoustic_scores  Pointer to a map where the mapping from the


The documentation doesn't seem to be consistent with the function signature: you say it is a map
to acoustic score, but it returns a pair.

danpovey · 2018-03-03T21:13:35Z

src/lat/lattice-functions.h

+/// ComputeAcousticScoresMap into the lattice.
+///
+///   @param [in] acoustic_scores  A map from the pair (frame-index,
+//pdf-id)


fix this. And you say it's a map to acoustic score: why is it a pair?

danpovey · 2018-03-03T21:18:46Z

src/latbin/lattice-to-fst.cc

+    po.Register("project-input", &project_input,
+                "Project to input labels (transition-ids); applicable only "
+                "when --read-compact=false");
+    po.Register("project-output", &project_output,


I think you mean word-ids. But I don't think this option makes sense. The lattice would be mostly epsilons. Can you remove it if it's not necessary.
Is project-input needed? Please simplify to only what you need for this PR.

danpovey · 2018-03-03T21:19:13Z

src/latbin/lattice-to-fst.cc

+        fst::VectorFst<StdArc> fst;
+        {
+          Lattice lat;
+          ConvertLattice(clat, &lat); // convert to non-compact form.. won't introduce


I don't like that these multi-line comments have different indentation on different lines.

danpovey · 2018-03-03T21:21:36Z

src/nnet3/nnet-example-utils.cc

  stats_.PrintStats();
 }

+void ScaleFst(BaseFloat scale, fst::StdVectorFst *fst) {


A function that does this already exists somewhere in fstext-utils.h. You can remove this from the header too.

danpovey

some small comments.

danpovey · 2018-03-05T00:08:48Z

egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_1a.sh

+# Final train prob (xent)   -1.9246                -1.5926                -1.6454
+# Final valid prob (xent)   -2.1873                -1.7990                -1.7107
+
+# train_set                           semisup15k_100k_250k    semisup50k_100k_250k    semisup100k_250k


are all of these results still part of the scripts? remove any that are not.
If these lines came from compare_wer.sh it would be nice if you could show the corresponding command line.
and an explanation of the naming convention would be nice too.... in semisupX_Y_Z, it's not clear what X, Y and Z are.

danpovey · 2018-03-05T00:18:53Z

egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_100k_semisupervised_1a.sh

+
+    echo "$0: generating egs from the supervised data"
+    steps/nnet3/chain/get_egs.sh --cmd "$decode_cmd" \
+               --left-context $egs_left_context --right-context $egs_right_context \


fix this indentation level

danpovey · 2018-03-05T00:20:49Z

egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_100k_semisupervised_1a.sh

+# Unsupervised weight: 1.0
+# Weights for phone LM (supervised, unsupervised): 3,2
+# LM for decoding unsupervised data: 4gram
+# Supervision: Naive split lattices


Is this accurate, that you are using naive splitting? You seem to be also using it for the other example scripts; but the paper seems to say that "smart splitting" is generally better. Can you clarify which options control the type of splitting?

vimalmanohar · 2018-03-05T00:39:13Z

Smart splitting is not committed in this pull request. That requires other binaries to be added.

On Sun, Mar 4, 2018 at 7:23 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. some small comments. ------------------------------ In egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_1a.sh <#2140 (comment)>: > + +# This is fisher chain recipe for training a model on a subset of around +# 100-300 hours of supervised data. +# This system uses phone LM to model UNK. +# local/semisup/run_50k.sh and local/semisup/run_100k.sh show how to call this. + +# train_set train_sup15k train_sup50k train_sup +# ivector_train_set semisup15k_100k_250k semisup50k_100k_250k train_sup +# WER on dev 27.75 21.41 19.23 +# WER on test 27.24 21.03 19.01 +# Final train prob -0.0959 -0.1035 -0.1224 +# Final valid prob -0.1823 -0.1667 -0.1503 +# Final train prob (xent) -1.9246 -1.5926 -1.6454 +# Final valid prob (xent) -2.1873 -1.7990 -1.7107 + +# train_set semisup15k_100k_250k semisup50k_100k_250k semisup100k_250k are all of these results still part of the scripts? remove any that are not. If these lines came from compare_wer.sh it would be nice if you could show the corresponding command line. and an explanation of the naming convention would be nice too.... in semisupX_Y_Z, it's not clear what X, Y and Z are. ------------------------------ In egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_100k_semisupervised_1a.sh <#2140 (comment)>: > + +if [ -z "$sup_egs_dir" ]; then + sup_egs_dir=$dir/egs_${supervised_set_perturbed} + frames_per_eg=$(cat $sup_chain_dir/egs/info/frames_per_eg) + + if [ $stage -le 12 ]; then + if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $sup_egs_dir/storage ]; then + utils/create_split_dir.pl \ + /export/b0{5,6,7,8}/$USER/kaldi-data/egs/fisher_english-$(date +'%m_%d_%H_%M')/s5c/$sup_egs_dir/storage $sup_egs_dir/storage + fi + mkdir -p $sup_egs_dir/ + touch $sup_egs_dir/.nodelete # keep egs around when that run dies. + + echo "$0: generating egs from the supervised data" + steps/nnet3/chain/get_egs.sh --cmd "$decode_cmd" \ + --left-context $egs_left_context --right-context $egs_right_context \ fix this indentation level ------------------------------ In egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_100k_semisupervised_1a.sh <#2140 (comment)>: > +# This version of script uses only supervised data for i-vector extractor +# training as against using the combined data as in run_tdnn_50k_semisupervised.sh. +# We use 3-gram LM trained on 100 hours of supervised data. We do not have +# enough data to do 4-gram LM rescoring as in run_tdnn_50k_semisupervised.sh. + +# This script uses phone LM to model UNK. +# This script uses the same tree as that for the seed model. +# See the comments in the script about how to change these. + +# Unsupervised set: train_unsup100k_250k +# unsup_frames_per_eg=150 +# Deriv weights: Lattice posterior of best path pdf +# Unsupervised weight: 1.0 +# Weights for phone LM (supervised, unsupervised): 3,2 +# LM for decoding unsupervised data: 4gram +# Supervision: Naive split lattices Is this accurate, that you are using naive splitting? You seem to be also using it for the other example scripts; but the paper seems to say that "smart splitting" is generally better. Can you clarify which options control the type of splitting? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2140 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV5T1yhM0m0doSDi7j7uCjVNYOKCDks5tbIVpgaJpZM4RZduw> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

danpovey · 2018-03-05T00:42:19Z

ok, fine; minimal is good. On Sun, Mar 4, 2018 at 7:39 PM, Vimal Manohar <[email protected]> wrote:

…

Smart splitting is not committed in this pull request. That requires other binaries to be added. On Sun, Mar 4, 2018 at 7:23 PM Daniel Povey ***@***.***> wrote: > ***@***.**** commented on this pull request. > > some small comments. > ------------------------------ > > In egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_1a.sh > <#2140 (comment)>: > > > + > +# This is fisher chain recipe for training a model on a subset of around > +# 100-300 hours of supervised data. > +# This system uses phone LM to model UNK. > +# local/semisup/run_50k.sh and local/semisup/run_100k.sh show how to call this. > + > +# train_set train_sup15k train_sup50k train_sup > +# ivector_train_set semisup15k_100k_250k semisup50k_100k_250k train_sup > +# WER on dev 27.75 21.41 19.23 > +# WER on test 27.24 21.03 19.01 > +# Final train prob -0.0959 -0.1035 -0.1224 > +# Final valid prob -0.1823 -0.1667 -0.1503 > +# Final train prob (xent) -1.9246 -1.5926 -1.6454 > +# Final valid prob (xent) -2.1873 -1.7990 -1.7107 > + > +# train_set semisup15k_100k_250k semisup50k_100k_250k semisup100k_250k > > are all of these results still part of the scripts? remove any that are > not. > If these lines came from compare_wer.sh it would be nice if you could show > the corresponding command line. > and an explanation of the naming convention would be nice too.... in > semisupX_Y_Z, it's not clear what X, Y and Z are. > ------------------------------ > > In > egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_ 100k_semisupervised_1a.sh > <#2140 (comment)>: > > > + > +if [ -z "$sup_egs_dir" ]; then > + sup_egs_dir=$dir/egs_${supervised_set_perturbed} > + frames_per_eg=$(cat $sup_chain_dir/egs/info/frames_per_eg) > + > + if [ $stage -le 12 ]; then > + if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $sup_egs_dir/storage ]; then > + utils/create_split_dir.pl \ > + /export/b0{5,6,7,8}/$USER/kaldi-data/egs/fisher_english-$(date +'%m_%d_%H_%M')/s5c/$sup_egs_dir/storage $sup_egs_dir/storage > + fi > + mkdir -p $sup_egs_dir/ > + touch $sup_egs_dir/.nodelete # keep egs around when that run dies. > + > + echo "$0: generating egs from the supervised data" > + steps/nnet3/chain/get_egs.sh --cmd "$decode_cmd" \ > + --left-context $egs_left_context --right-context $egs_right_context \ > > fix this indentation level > ------------------------------ > > In > egs/fisher_english/s5/local/semisup/chain/tuning/run_tdnn_ 100k_semisupervised_1a.sh > <#2140 (comment)>: > > > +# This version of script uses only supervised data for i-vector extractor > +# training as against using the combined data as in run_tdnn_50k_semisupervised.sh. > +# We use 3-gram LM trained on 100 hours of supervised data. We do not have > +# enough data to do 4-gram LM rescoring as in run_tdnn_50k_semisupervised.sh. > + > +# This script uses phone LM to model UNK. > +# This script uses the same tree as that for the seed model. > +# See the comments in the script about how to change these. > + > +# Unsupervised set: train_unsup100k_250k > +# unsup_frames_per_eg=150 > +# Deriv weights: Lattice posterior of best path pdf > +# Unsupervised weight: 1.0 > +# Weights for phone LM (supervised, unsupervised): 3,2 > +# LM for decoding unsupervised data: 4gram > +# Supervision: Naive split lattices > > Is this accurate, that you are using naive splitting? You seem to be also > using it for the other example scripts; but the paper seems to say that > "smart splitting" is generally better. Can you clarify which options > control the type of splitting? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#2140# pullrequestreview-101026334>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ AEATV5T1yhM0m0doSDi7j7uCjVNYOKCDks5tbIVpgaJpZM4RZduw> > . > -- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2140 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu0Cm-YmrYT-VLy3jlV-I0ZKLQOyEks5tbIk3gaJpZM4RZduw> .

danpovey · 2018-03-15T04:15:59Z

@vimalmanohar, sorry, there are conflicts now. Please resolve and once you confirm it's good to merge I'll merge.

danpovey · 2018-03-20T04:37:36Z

@vimalmanohar, don't forget about this.

vimalmanohar · 2018-03-20T18:55:50Z

I'm still testing it one more time.

On Tue, Mar 20, 2018 at 12:37 AM Daniel Povey ***@***.***> wrote: @vimalmanohar <https://github.com/vimalmanohar>, don't forget about this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2140 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV3sLxfoRkWXe10bzcJNZZnvFaFHiks5tgIebgaJpZM4RZduw> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

vimalmanohar · 2018-03-26T21:59:51Z

I fixed all issues and conflicts.

danpovey

Noticed some small things...

danpovey · 2018-03-27T20:16:30Z

src/latbin/lattice-compose.cc

+        // Compute a map from each (t, tid) to (sum_of_acoustic_scores, count)
+        unordered_map<std::pair<int32,int32>, std::pair<BaseFloat, int32>,
+                                            PairHasher<int32> > acoustic_scores;
+        if (!write_compact)


I don't see why this acoustic-scores-map thing is necessary here because composition with an FST that only has weights on the graph side of the scores, will leave the acoustic scores where they were.

danpovey · 2018-03-27T20:26:23Z

src/chainbin/nnet3-chain-get-egs.cc

+          this_deriv_weights(i) = (*deriv_weights)(t);
+      }
+      KALDI_ASSERT(output_weights.Dim() == num_frames_subsampled);
+      this_deriv_weights.MulElements(output_weights);


I won't let this issue hold up merging this PR, but you might want to measure the effect of, in the case where 'deriv_weights' are supplied, just ignoring the 'output_weights' here instead of multiplying them by them. Based on my previous experience, this would improve the results. Don't add options though-- too much complexity

danpovey · 2018-03-27T20:26:59Z

src/chainbin/nnet3-chain-get-egs.cc

+                "and input frames.");
+    po.Register("deriv-weights-rspecifier", &deriv_weights_rspecifier,
+                "Per-frame weights that scales a frame's gradient during "
+                "backpropagation."


need a space here after the ".".

danpovey · 2018-03-27T20:28:41Z

src/lat/lattice-functions.cc

 }

+
+void ComputeAcousticScoresMap(


if it turns out you don't need this code after looking into the composition, you can remove it.

vimalmanohar · 2018-03-28T02:03:38Z

Ok, I removed it from that binary but it is required elsewhere.

On Tue, Mar 27, 2018 at 4:31 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. Noticed some small things... ------------------------------ In src/latbin/lattice-compose.cc <#2140 (comment)>: > @@ -94,6 +102,11 @@ int main(int argc, char *argv[]) { std::string key = lattice_reader1.Key(); KALDI_VLOG(1) << "Processing lattice for key " << key; Lattice lat1 = lattice_reader1.Value(); + // Compute a map from each (t, tid) to (sum_of_acoustic_scores, count) + unordered_map<std::pair<int32,int32>, std::pair<BaseFloat, int32>, + PairHasher<int32> > acoustic_scores; + if (!write_compact) I don't see why this acoustic-scores-map thing is necessary here because composition with an FST that only has weights on the graph side of the scores, will leave the acoustic scores where they were. ------------------------------ In src/chainbin/nnet3-chain-get-egs.cc <#2140 (comment)>: > - frame_subsampling_factor); + if (!deriv_weights) { + NnetChainSupervision nnet_supervision("output", supervision_part, + output_weights, + first_frame, + frame_subsampling_factor); + nnet_chain_eg.outputs[0].Swap(&nnet_supervision); + } else { + Vector<BaseFloat> this_deriv_weights(num_frames_subsampled); + for (int32 i = 0; i < num_frames_subsampled; i++) { + int32 t = i + start_frame_subsampled; + if (t < deriv_weights->Dim()) + this_deriv_weights(i) = (*deriv_weights)(t); + } + KALDI_ASSERT(output_weights.Dim() == num_frames_subsampled); + this_deriv_weights.MulElements(output_weights); I won't let this issue hold up merging this PR, but you might want to measure the effect of, in the case where 'deriv_weights' are supplied, just ignoring the 'output_weights' here instead of multiplying them by them. Based on my previous experience, this would improve the results. Don't add options though-- too much complexity ------------------------------ In src/chainbin/nnet3-chain-get-egs.cc <#2140 (comment)>: > @@ -200,6 +266,20 @@ int main(int argc, char *argv[]) { po.Register("srand", &srand_seed, "Seed for random number generator "); po.Register("length-tolerance", &length_tolerance, "Tolerance for " "difference in num-frames between feat and ivector matrices"); + po.Register("supervision-length-tolerance", &supervision_length_tolerance, + "Tolerance for difference in num-frames-subsampled between " + "supervision and deriv weights, and also between supervision " + "and input frames."); + po.Register("deriv-weights-rspecifier", &deriv_weights_rspecifier, + "Per-frame weights that scales a frame's gradient during " + "backpropagation." need a space here after the ".". ------------------------------ In src/lat/lattice-functions.cc <#2140 (comment)>: > @@ -1646,4 +1649,110 @@ void ComposeCompactLatticeDeterministic( fst::Connect(composed_clat); } + +void ComputeAcousticScoresMap( if it turns out you don't need this code after looking into the composition, you can remove it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2140 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV6RI96zPctmOu9pGjSHiSyVNaskXks5tiqGLgaJpZM4RZduw> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

…aldi-asr#2140.

…2140; un-support --transform-dir. Thx: @aaror8 (#2334)

…sr#2140) Conflicts: egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py egs/wsj/s5/steps/nnet3/chain/train.py

…aldi-asr#2140; un-support --transform-dir. Thx: @aaror8 (kaldi-asr#2334) Conflicts: egs/wsj/s5/steps/nnet3/get_egs.sh

…sr#2140)

…aldi-asr#2140; un-support --transform-dir. Thx: @aaror8 (kaldi-asr#2334)

hhadian and others added 30 commits June 2, 2017 22:42

Merge branch 'master' into semi_supervised

c6ffb15

Merge branch 'master' of github.com:kaldi-asr/kaldi

7b01bb0

Add nnet3, chain, and semi_sepervised scripts for fisher english

403e3e2

Merge remote-tracking branch 'origin/semi_supervised' into semi_super…

0c8974e

…vised Travis was failing to compile(not sure why)-- I used the "Update Branch" button

Merge branch 'master' of github.com:kaldi-asr/kaldi

e1de4e4

Merge branch 'master' of github.com:kaldi-asr/kaldi

41952cd

Merge branch 'master' of github.com:kaldi-asr/kaldi

2e2b3d1

Merge branch 'master' of github.com:kaldi-asr/kaldi

51c32f7

Merge branch 'master' of github.com:kaldi-asr/kaldi

232397e

Merge branch 'master' of github.com:kaldi-asr/kaldi

1414f6f

Merge branch 'master' of github.com:kaldi-asr/kaldi

c65ef65

Merge branch 'master' of github.com:vimalmanohar/kaldi into chain-smbr

9677175

Merge branch 'master' of github.com:kaldi-asr/kaldi

20cf238

Merge branch 'master' of github.com:vimalmanohar/kaldi into chain-smbr

ae1cfe1

Merge branch 'master' of github.com:kaldi-asr/kaldi

bf56938

SMBR chain

0bacc83

chain-smbr: Bug fixes

2c43456

Chain SMBR fixes

6adc948

Conflicts: src/chain/chain-denominator-smbr.cc

chain-smbr: Bug fixes

2959279

Merge branch 'master' of github.com:kaldi-asr/kaldi

51ec051

chain-smbr: Bug fix

758e9a4

Merge branch 'master' of github.com:kaldi-asr/kaldi

d364040

Merge branch 'master' of github.com:kaldi-asr/kaldi

2f15292

Merge branch 'master' of github.com:kaldi-asr/kaldi

d8db02d

Merge branch 'master' of github.com:kaldi-asr/kaldi

9d97243

temp

57d1016

smbr-dash

a03b401

smbr without leaky

0682618

chain-smbr: Fix bugs in chain smbr

62da39a

smbr training

5b7879d

Make block-size fixed

7a39bdb

danpovey reviewed Feb 28, 2018

View reviewed changes

vimalmanohar added 2 commits March 1, 2018 12:28

Small change for merging

abae1a9

semisup: Fixing based on comments

2bae581

Show some info + warning + flush all the remaining partial blocks to …

a455f03

…the last archive

danpovey reviewed Mar 3, 2018

View reviewed changes

vimalmanohar added 2 commits March 4, 2018 14:49

Some changes based on the comments

17a703f

Merging new multilingual script

2bbfd07

danpovey reviewed Mar 5, 2018

View reviewed changes

Various bug fixes

6fefecb

vimalmanohar added 2 commits March 26, 2018 17:50

Fixed few bugs and tested

0460f06

Merging kaldi master

812b8c8

danpovey reviewed Mar 27, 2018

View reviewed changes

Fixed minor issues

c729b4c

danpovey merged commit 191b39a into kaldi-asr:master Mar 28, 2018

danpovey added a commit to danpovey/kaldi that referenced this pull request Apr 3, 2018

[scripts] Fix to per-utt issue after changes to chain/get_egs.sh in k…

54cefc5

…aldi-asr#2140.

danpovey added a commit that referenced this pull request Apr 5, 2018

[scripts] Fix to nnet3 bug RE per-utt splitting that appeared after #…

5294666

…2140; un-support --transform-dir. Thx: @aaror8 (#2334)

LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018

[src,scripts,egs] Semi-supervised training on Fisher English (kaldi-a…

35005fb

…sr#2140) Conflicts: egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py egs/wsj/s5/steps/nnet3/chain/train.py

LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018

[scripts] Fix to nnet3 bug RE per-utt splitting that appeared after k…

b07356e

…aldi-asr#2140; un-support --transform-dir. Thx: @aaror8 (kaldi-asr#2334) Conflicts: egs/wsj/s5/steps/nnet3/get_egs.sh

hhadian mentioned this pull request Apr 26, 2018

[WIP] Semi-supervised training using chain models #1657

Closed

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[src,scripts,egs] Semi-supervised training on Fisher English (kaldi-a…

9490ef9

…sr#2140)

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[scripts] Fix to nnet3 bug RE per-utt splitting that appeared after k…

0f272c3

…aldi-asr#2140; un-support --transform-dir. Thx: @aaror8 (kaldi-asr#2334)

Semi-supervised training on Fisher English #2140

Semi-supervised training on Fisher English #2140

Uh oh!

Conversation

vimalmanohar commented Jan 10, 2018

Uh oh!

danpovey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vimalmanohar commented Mar 1, 2018

Uh oh!

danpovey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vimalmanohar commented Mar 5, 2018 via email

Uh oh!

danpovey commented Mar 5, 2018 via email

Uh oh!

danpovey commented Mar 15, 2018

Uh oh!

danpovey commented Mar 20, 2018

Uh oh!

vimalmanohar commented Mar 20, 2018 via email

Uh oh!

vimalmanohar commented Mar 26, 2018

Uh oh!

danpovey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!