WIP: Added babel_multilang example directory for multilingual setup using babel languages. by pegahgh · Pull Request #1027 · kaldi-asr/kaldi

pegahgh · 2016-09-14T23:03:18Z

No description provided.

danpovey · 2016-09-15T01:44:19Z

vijay, would you mind doing a first pass of review on this?

On Wed, Sep 14, 2016 at 4:03 PM, pegahgh [email protected] wrote:

You can view, comment on, or merge this pull request online at:

#1027
Commit Summary

added babel_multilang example dir for multilingual setting and added
and modified some codes and binaries for multilingual setup.

File Changes

A egs/babel_multilang/s5/local/nnet3/run_common_langs.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-0 (121)

A egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-1 (51)

A egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-2 (241)

A egs/babel_multilang/s5/run-2c-bnf.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-3 (119)

A egs/babel_multilang/s5/run-4-anydecode-langs.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-4 (471)

A egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-5 (147)

A egs/wsj/s5/steps/nnet3/multi/make_configs.py
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-6 (678)

A egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-7 (562)

M egs/wsj/s5/steps/nnet3/tdnn/make_configs.py
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-8 (210)

M src/base/kaldi-math-test.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-9 (35)

M src/base/kaldi-math.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-10 (22)

M src/base/kaldi-math.h
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-11 (5)

M src/nnet3/nnet-example-utils.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-12 (173)

M src/nnet3/nnet-example-utils.h
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-13 (50)

M src/nnet3/nnet-nnet.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-14 (4)

M src/nnet3/nnet-nnet.h
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-15 (3)

M src/nnet3/nnet-utils.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-16 (5)

M src/nnet3bin/Makefile
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-17 (15)

M src/nnet3bin/nnet3-copy-egs.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-18 (206)

A src/nnet3bin/nnet3-copy-multiple-egs.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-19 (288)

M src/nnet3bin/nnet3-copy.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-20 (46)

M src/nnet3bin/nnet3-merge-egs.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-21 (3)

Patch Links:

https://github.com/kaldi-asr/kaldi/pull/1027.patch

https://github.com/kaldi-asr/kaldi/pull/1027.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1027, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-tRmxgPcbYjG9soZs2tJUM66ywSks5qqH05gaJpZM4J9Vpw
.

vijayaditya · 2016-09-15T16:41:14Z

OK, on it.

--Vijay

On Wed, Sep 14, 2016 at 9:44 PM, Daniel Povey [email protected]
wrote:

vijay, would you mind doing a first pass of review on this?

On Wed, Sep 14, 2016 at 4:03 PM, pegahgh [email protected] wrote:

You can view, comment on, or merge this pull request online at:

#1027
Commit Summary

added babel_multilang example dir for multilingual setting and added
and modified some codes and binaries for multilingual setup.

File Changes

A egs/babel_multilang/s5/local/nnet3/run_common_langs.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-0 (121)

A egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-1 (51)

A egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-2 (241)

A egs/babel_multilang/s5/run-2c-bnf.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-3 (119)

A egs/babel_multilang/s5/run-4-anydecode-langs.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-4 (471)

A egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-5 (147)

A egs/wsj/s5/steps/nnet3/multi/make_configs.py
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-6 (678)

A egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-7 (562)

M egs/wsj/s5/steps/nnet3/tdnn/make_configs.py
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-8 (210)

M src/base/kaldi-math-test.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-9 (35)

M src/base/kaldi-math.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-10 (22)

M src/base/kaldi-math.h
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-11 (5)

M src/nnet3/nnet-example-utils.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-12 (173)

M src/nnet3/nnet-example-utils.h
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-13 (50)

M src/nnet3/nnet-nnet.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-14 (4)

M src/nnet3/nnet-nnet.h
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-15 (3)

M src/nnet3/nnet-utils.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-16 (5)

M src/nnet3bin/Makefile
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-17 (15)

M src/nnet3bin/nnet3-copy-egs.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-18 (206)

A src/nnet3bin/nnet3-copy-multiple-egs.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-19 (288)

M src/nnet3bin/nnet3-copy.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-20 (46)

M src/nnet3bin/nnet3-merge-egs.cc
https://github.com/kaldi-asr/kaldi/pull/1027/files#diff-21 (3)

Patch Links:

https://github.com/kaldi-asr/kaldi/pull/1027.patch

https://github.com/kaldi-asr/kaldi/pull/1027.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1027, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu-
tRmxgPcbYjG9soZs2tJUM66ywSks5qqH05gaJpZM4J9Vpw>
.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1027 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoD3Z-3FEV-45LUTPjG4IGb9oowasks5qqKL3gaJpZM4J9Vpw
.

vijayaditya

Added comments on run_common_langs.sh.

vijayaditya · 2016-09-15T21:13:42Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

@@ -0,0 +1,121 @@
+#!/bin/bash
+


A comment describing this script.

vijayaditya · 2016-09-15T21:23:07Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+train_stage=-10
+generate_alignments=true # false if doing ctc training
+speed_perturb=true
+use_flp=false


@pegahgh many of these variables are different from a normal nnet3 script. Please describe what they mean.

vijayaditya · 2016-09-15T21:24:45Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+    #Although the nnet will be trained by high resolution data, we still have to perturbe the normal data to get the alignment
+    # _sp stands for speed-perturbed
+    for datadir in train; do
+      utils/perturb_data_dir_speed.sh 0.9 data/$L/${datadir} data/$L/temp1


use the *perturb_3way.sh script. Please add volume perturbation, which can also be done using a script like utils/data/volume_perturb.sh. Please check for the exact names.

The only concern is that we should redo plp+pitch extraction for original data by using this script!

vijayaditya · 2016-09-15T21:25:40Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+. ./utils/parse_options.sh
+
+
+L=$1


use a better variable name, L is uninformative.

Changed L to lang!

vijayaditya · 2016-09-15T21:28:47Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+    done
+  fi
+
+  train_set=train_sp


When checking for .done scripts, print out a statement to the user describing what stage is being skipped, why and what the user can do to force the script to run that stage.

You could probably also check if the data/*_sp directories are done and skip the feature extraction.

vijayaditya · 2016-09-15T21:29:46Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+
+    utils/data/perturb_data_dir_volume.sh $data_dir || exit 1 ; 
+
+    steps/make_mfcc.sh --nj 70 --mfcc-config conf/mfcc_hires.conf \


With the new make_mfcc.sh script you need specify the log and ark directories.

It is in the next line!!!!!!

vijayaditya · 2016-09-15T21:31:14Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+
+if [ $stage -le 4 ]; then
+  if [[ "$use_pitch" == "true" ]]; then
+    echo use_pitch = $use_pitch


echo a user understandable message like
$0: Generating pitch features for <data-dir-name(s)> as use_pitch=$use_pitch

vijayaditya · 2016-09-15T21:32:25Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+        steps/make_pitch.sh --nj 70 --pitch-config $pitch_conf \
+          --cmd "$train_cmd" data/$L/${dataset}_pitch exp/$L/make_pitch/${dataset} $pitchdir;
+        fi
+        aux_suffix=_pitch


aux_suffix was a bit confusing for me and I had to search for its meaning. Do you think feat_suffix or pitch_suffix might be better ?

changed to feat_suffix!

vijayaditya · 2016-09-15T22:01:23Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+      fi
+
+      if [ ! -f data/$L/${dataset}_mfcc${aux_suffix}/feats.scp ]; then
+        steps/append_feats.sh --nj 16 --cmd "$train_cmd" data/$L/${dataset} \


Please check if append_feats.sh can support different types of argument lists like make_mfcc.sh. If yes drop the log-dir and ark-dir specification. If not could you please add this feature to append_feats.sh just for the sake of consistency across scripts.

I am not sure, if Dan agrees, since lots of scripts use append_feats.sh! We can do that later!

I agree with Vijay that it would be a good idea to extend it in the same way-- now is a good opportunity to test the changes, since you'll have to rerun your stuff to test it after other changes anyway.

vijayaditya

I will continue the review over the weekend, as I have to work on some paper submissions.

vijayaditya · 2016-09-15T22:03:54Z

egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh

@@ -0,0 +1,51 @@
+#!/bin/bash
+


Once again, add a comment describing what this script does.

vijayaditya · 2016-09-15T22:04:29Z

egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh

+set -e
+stage=1
+train_stage=-10
+generate_alignments=true # false if doing ctc training


Please describe the variables.

vijayaditya · 2016-09-15T22:06:33Z

egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh

+  train_set=train_sp
+fi
+
+extractor=$global_extractor


Why is the extractor variable even required ? Could you not just use global_extractor at line 45.

vijayaditya · 2016-09-15T22:07:38Z

egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh

+
+. ./utils/parse_options.sh
+
+L=$1


Once again change the name L.

vijayaditya · 2016-09-15T22:09:37Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

@@ -0,0 +1,241 @@
+#!/bin/bash
+
+# This is a crosslingual training setup where there are no shared phones.


Minor grammatical mistakes in the comment. Please check.

vijayaditya · 2016-09-15T22:17:23Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

+  echo "$0: creating neural net config for multilingual setups"
+   # create the config files for nnet initialization
+  $cmd $dir/log/make_config.log \
+  python steps/nnet3/multi/make_configs.py  \


I would suggest that you call this script steps/nnet3/multilingual/make_tdnn_configs.py` as you might want to add support for other architectures later on.

In other make_configs.py scripts there is support for specifying either the {feat,ivector}-dim and num-targets or providing the corresponding directories from which the script extracts the required values. This makes your top level scripts less verbose. I would recommend adding this support to this config generator script, at least for the sake of consistency.

It already supports previous option num-targets or the directory to extract this value.
do you mean to provide option to read multiple ali-dir for all languages and extract num-targets per lang?

@pegahgh I don't see the point of extracting feat_dim in this script if your config generator already supports providing a feat-dir. Same for ivector-dim.

vijayaditya · 2016-09-15T22:18:27Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

+   # create the config files for nnet initialization
+  $cmd $dir/log/make_config.log \
+  python steps/nnet3/multi/make_configs.py  \
+    --splice-indexes "$splice_indexes"  \


I would recommend removing variables in this script which are not used more than once. This helps for easy reading.

vijayaditya · 2016-09-15T22:19:08Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

+    nnet3-init --srand=-2 $dir/configs/init.config $dir/init.raw || exit 1;
+fi
+
+. $dir/configs/vars || exit 1;


describe which variables are being sourced here.

vijayaditya · 2016-09-15T22:20:49Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

+
+. $dir/configs/vars || exit 1;
+
+if [ $stage -le 10 ]; then


Are you sure you want to all this in a top level script in local/. I would recommend moving these steps to some script like steps/nnet3/multilingual/get_egs.sh

vijayaditya · 2016-09-15T22:22:54Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

+echo print-interval = $print_interval
+if [ $stage -le 11 ]; then
+  echo "$0: training mutilingual model."
+  steps/nnet3/multi/train_tdnn.sh --cmd "$train_cmd" \


is this script specific to tdnns ? I think most of our training steps are similar for all feed-forward dnns. I recommend renaming it to steps/nnet3/multilingual/train_dnn.sh.

Agreed, please rename.

vijayaditya · 2016-09-18T02:07:04Z

egs/babel_multilang/s5/run-2c-bnf.sh

+speed_perturb=true
+multidir=exp/nnet3/multi_bnf_10_close_lang_plus_grg
+global_extractor=exp/multi/nnet3/extractor
+lang_list=(GRG LIT MONG TUR KAZ KUR PSH SWA TOK IGBO DHO)


Make it explicit here that this list of languages has been selected for GRG.

As suggested before please describe each variable which is not usually found in a typical nnet3 recipe.

vijayaditya · 2016-09-18T02:08:52Z

egs/babel_multilang/s5/run-2c-bnf.sh

+. ./utils/parse_options.sh
+
+
+L=$1


once again, a better variable name please.

vijayaditya · 2016-09-18T02:11:45Z

egs/babel_multilang/s5/run-2c-bnf.sh

+if $speed_perturb; then
+  suffix=_sp
+fi
+exp_dir=exp/$L


This directory structure ({data,exp}/<lang-name>/) is not used in other Babel recipes. Better make it consistent with other scripts.

This is the main structure in this multilingual egs directory!

vijayaditya · 2016-09-18T02:14:25Z

egs/babel_multilang/s5/run-2c-bnf.sh

+ ./local/nnet3/run_tdnn_joint_babel_sp_bnf.sh --dir $multidir \
+    --avg-num-archives $num_archives \
+    --global-extractor $global_extractor \
+    --init-lrate $bnf_init_lrate \


Unless you anticipate the users to tune the variables a lot, please eliminate any variable which is not used more than once.

As suggested before any time you skip a stage as it is already done, please let the user know what you are skipping and how they can force that stage to run.

vijayaditya

Completed review of few more scripts.

vijayaditya · 2016-09-18T02:21:09Z

egs/babel_multilang/s5/run-2c-bnf.sh

+if [ ! -f $data_bnf_dir/.done ]; then
+  mkdir -p $dump_bnf_dir
+  # put the archives in ${dump_bnf_dir}/.
+  steps/nnet3/dump_bottleneck_features.sh --use-gpu true --nj $train_nj --cmd "$train_cmd" \


I would recommend adding support for different argument lists similar to steps/make_mfcc.sh.

vijayaditya · 2016-09-18T02:23:01Z

egs/babel_multilang/s5/run-2c-bnf.sh

+  touch $exp_dir/tri5b/.done
+fi
+
+if [ ! $exp_dir/tri6/.done -nt $exp_dir/tri5b/.done ]; then


Do you actually use this GMM-HMM system anywhere ? If not you could probably skip this training.
@jtrmal any comments ?

I thought some people may want to use GMM-HMM trained on top of BN features!
any comment?

If this is something that's not immediately needed, but you think may be needed in future, you could just put it inside if false; then .... fi.

vijayaditya · 2016-09-18T02:24:30Z

egs/babel_multilang/s5/run-4-anydecode-langs.sh

@@ -0,0 +1,471 @@
+#!/bin/bash


I am not familiar with all the possible use-cases for this script. @jtrmal might be better suited to review this script.

danpovey · 2016-09-18T21:33:21Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+set -e
+stage=1
+train_stage=-10
+generate_alignments=true # false if doing ctc training


any references to ctc are outdated and should be removed also wherever you got this script from.

danpovey · 2016-09-18T21:34:41Z

egs/babel_multilang/s5/run-2c-bnf.sh

@@ -0,0 +1,119 @@
+#!/bin/bash
+# -v3 is as -v2 but it just consists 10 closest fullLP langs to GRG + GRG.


i think these comments may be out of date. Comments at the top of this script should say what the script does... not clear to me right now.

pegah, please update the comments on this script... I think you may have also missed my other comments.

danpovey · 2016-09-18T21:37:16Z

src/base/kaldi-math.h


+// Returns a random integer number according to a discrete probability distribution.
+// It works based on sampling from a discrete distribution and 
+// it returns i with prob(i).


document that 'prob' must sum to one.

danpovey · 2016-09-18T21:43:53Z

src/nnet3/nnet-nnet.cc

  return -1;
 }

+void Nnet::RenameNode(int32 node_index, const std::string &new_node_name) {


I think this function should be called SetNodeName for consistency with GetNodeName(), and
document that this can be used, for example, for renaming output nodes.
But you should do some more checking here.
Do you require that after this renaming, the nnet should still be valid? I would expect so.
If so, then call Check() at the end of this function. You should probably also verify that IsValidName(new_node_name).

danpovey · 2016-09-18T21:45:40Z

src/nnet3bin/nnet3-copy.cc

+    if (!rename_node_names.empty()) {
+      std::vector<string> orig_names, 
+        new_names;
+      //GetRenameNodeNames(rename_node_names, &orig_names, new_names);


I think making this a function (appropriately documented!) in nnet-utils.h called
void RenameNodes(std::string &rename_node_names, Nnet *nnet);
would make this much nicer.

danpovey · 2016-09-18T21:46:52Z

src/nnet3bin/nnet3-merge-egs.cc

 // or crashes if it is not there.
 int32 NumOutputIndexes(const NnetExample &eg) {
  for (size_t i = 0; i < eg.io.size(); i++)
-    if (eg.io[i].name == "output")


You can't just change the implementation of functions without re-doing their documentation, like this!
Think about what would be the right way to this.

Actually this is probably OK, I see that it is in a binary, not in nnet3/.

The way this is set up right now, the training is going to be extremely inefficient because graph compilation will happen on every single minibatch. This is because the minibatches have randomized structure.
The way I think you should solve this is to modify nnet3-merge-egs so that it batches together inputs that have the same structure (where by "the same structure" I mean the set of input and output names). It may be best to write a class to do the merging. Also the order of the NnetIo objects in the eg is not necessarily well defined, and should be ignored, so bear that in mind.

vijayaditya · 2016-09-19T13:24:03Z

src/nnet3bin/nnet3-copy-multiple-egs.cc

+    typedef kaldi::int64 int64;
+
+    const char *usage =
+        "Copy examples (single frames or fixed-size groups of frames) for neural\n"


Update the Usage message.

vijayaditya · 2016-09-19T13:34:06Z

src/nnet3bin/nnet3-copy.cc

-    po.Read(argc, argv);
+    po.Register("rename-node-names", &rename_node_names, "Comma-separated list of noed names need to be modified"
+                " and their new name. e.g. 'affine0/affine0-lang1,affine1/affine1-lang1'");
+    po.Register("add-output-nodes", &add_output_nodes, "Comma-separated list of output node name and its input"


Comma-separated list of output node names and their corresponding input descriptors, to be added the nnet config.

vijayaditya · 2016-09-19T20:00:24Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+nj=4
+cmd=run.pl
+use_gpu=false
+bnf_name=renorm4


Bnf_name seems to be a very critical parameter in this script. I think it is unwise to assign it by default, as we frequently vary the neural network architecture in our models.

vijayaditya · 2016-09-19T20:01:40Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+[ -f path.sh ] && . ./path.sh # source the path.
+. parse_options.sh || exit 1;
+
+if [ $# != 5 ]; then


As I had suggested in other scripts, it might be better to support multiple argument lists like steps/mke_mfcc.sh for the sake of consistency. By default you could drop the arkdir and logdir arguments.

I agree that it's better to use the new style of putting these things in subdirectories of the destination directory.

vijayaditya · 2016-09-19T20:08:32Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+  *) echo "Invalid feature type $feat_type" && exit 1;
+esac
+
+if [ ! -z "$transform_dir" ]; then


We are not using fMLLR/lda features in the nnet3 recipes. You could simplify your script by dropping support for these features.

vijayaditya · 2016-09-19T20:22:49Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+N0=$(cat $data/feats.scp | wc -l)
+N1=$(cat $archivedir/raw_bnfeat_$name.*.scp | wc -l)
+if [[ "$N0" != "$N1" ]]; then
+  echo "Error happens when generating BNF for $name (Original:$N0  BNF:$N1)"


echo "$0 : An error occurred...."

vijayaditya · 2016-09-19T20:23:48Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+# Concatenate feats.scp into bnf_data
+for n in $(seq $nj); do  cat $archivedir/raw_bnfeat_$name.$n.scp; done > $bnf_data/feats.scp
+
+for f in segments spk2utt text utt2spk wav.scp char.stm glm kws reco2file_and_channel stm; do


You forgot utt2uniq.

vijayaditya · 2016-09-19T20:25:16Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

@@ -0,0 +1,147 @@
+#!/bin/bash


For the sake of consistency with naming convention used in Kaldi, use the name
steps/nnet3/make_bottleneck_features.sh.

vijayaditya · 2016-09-19T20:28:44Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+if [ $stage -le 1 ]; then
+  echo "Making BNF scp and ark."
+  echo output-node name=output input=$bnf_name > output.config
+  modified_bnf_nnet="nnet3-copy --rename-node-names=output/output-bkp $bnf_nnet -  | nnet3-init - output.config - |"


What happens if there already exists a node with the name output-bkp in your network ?

Let's just assume there was no node named output-bkp. But about this-- please see Issue #1040, eventually it will be possible to rename the node via config files using nnet3-init or nnet-copy. I'm not sure if we should finish that issue before merging this, but anyway, proceed as if we were merging this first.

vijayaditya · 2016-09-19T20:29:22Z

egs/wsj/s5/steps/nnet3/dump_bottleneck_features.sh

+
+
+if [ $stage -le 1 ]; then
+  echo "Making BNF scp and ark."


echo "$0: Generating bottle-neck features"

vijayaditya · 2016-09-19T20:53:44Z

egs/wsj/s5/steps/nnet3/multi/make_configs.py

@@ -0,0 +1,678 @@
+#!/usr/bin/env python


Do you plan to support chain training with the multi-lingual training scripts ? If not you could probably eliminate few options and simplify the script significantly.

vijayaditya · 2016-09-19T20:54:12Z

egs/wsj/s5/steps/nnet3/multi/make_configs.py

+
+def GetArgs():
+    # we add compulsary arguments as named arguments for readability
+    parser = argparse.ArgumentParser(description="Writes config files and variables "


Modify the description.

vijayaditya · 2016-09-19T20:56:50Z

egs/wsj/s5/steps/nnet3/multi/make_configs.py

+                                help="iVector dir, which will be used to derive the ivector-dim  ", default=None)
+
+    num_target_group = parser.add_mutually_exclusive_group(required = True)
+    num_target_group.add_argument("--num-targets", type=int,


This mutually exclusive group does not make sense to me. Please rewrite this script rather than modifying the existing script.

why do you think it doesn't make sense?You can provide either num-targets or num-multiple-targets!!

Why do you still need the options corresponding to uni-lingual networks (--num-targets, --ali-dir and --tree-dir) ?
There are several other parts of this script which are not relevant in the context of multi-lingual networks. You could cull all these unnecessary parts to simplify this script substantially. In its current form this script is able to generate networks for uni-lingual xent training, uni-lingual chain training and multi-lingual xent training. I would strongly suggest that you keep the config generator scripts simple.

vijayaditya · 2016-09-19T21:21:56Z

egs/wsj/s5/steps/nnet3/multi/make_configs.py

+
+    return args
+
+def AddPerDimAffineLayer(config_lines, name, input, input_window):


This config generator script seems to have been copied from an older tdnn config generator. The new script is eliminates a lot of these options.

vijayaditya · 2016-09-19T21:23:34Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+exit_stage=-100 # you can set this to terminate the training early.  Exits before running this stage
+
+# count space-separated fields in splice_indexes to get num-hidden-layers.
+splice_indexes="-4,-3,-2,-1,0,1,2,3,4  0  -2,2  0  -4,4 0"


Please use your current best splice_indexes as the default.

vijayaditya · 2016-09-19T21:24:52Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+unset args[${#args[@]}-1]
+num_lang=$[${#args[@]}/3]
+
+for i in `seq 0 $[$num_lang-1]`; do


I see a large potential for argument offset errors.

vijayaditya · 2016-09-19T21:26:11Z

egs/wsj/s5/steps/nnet3/tdnn/make_configs.py

    parser.add_argument("--relu-dim", type=int,
                        help="dimension of ReLU nonlinearities")

-    parser.add_argument("--self-repair-scale-nonlinearity", type=float,


you are undoing recent changes committed by @freewym .

vijayaditya · 2016-09-19T21:27:19Z

egs/wsj/s5/steps/nnet3/tdnn/make_configs.py

@@ -104,10 +104,16 @@ def GetArgs():
    parser.add_argument("--relu-dim", type=int,


@pegahgh You code is undoing most of the recent changes. Please update your branch.

vijayaditya · 2016-09-20T14:46:58Z

@pegahgh I will continue review after the requested changes have been made. Please remember to update the usage messages and documentation in the C++ code.

pegahgh · 2016-09-21T00:39:38Z

@danpovey @vijayaditya Thanks for comments.
I almost fixed all the issues, and I will push changes tonight!

danpovey · 2016-09-21T20:49:17Z

As Pegah, Vijay and I discussed offline, this is going to be reworked with a slightly different design geared towards greater I/O efficiency, so I'm marking it as WIP.

danpovey · 2016-09-21T01:43:08Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+. parse_options.sh || exit 1;
+
+if [ $# -lt 1 ]; then
+  echo "Usage: $0 [opts] <data> <lang> <ali-dir> <exp-dir>"


The usage message needs to be accurate.

danpovey · 2016-09-21T01:47:39Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+
+  exit 1;
+fi
+#data=$1


remove these old comments.

danpovey · 2016-09-21T01:52:45Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+    # Set off jobs doing some diagnostics, in the background.
+    # Use the egs dir from the previous iteration for the diagnostics
+    for i in `seq 0 $[$num_lang-1]`;do
+      rename_io_names="output-$i/output"


For computing the diagnostics probabilities, I don't see that it's necessary or desirable to do this renaming of outputs.
nnet3-compute-prob already, I believe, computes the output for each of the output nodes that is defined, and separately prints those stats per output layer. All that might be needed is to make sure that it doesn't die because of the 'IsSimpleNnet()' check.

danpovey · 2016-09-21T01:53:51Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+    if [ $x -gt 0 ]; then
+      rename_io_names="output-0/output"
+
+      $cmd $dir/log/progress.$x.log \


For the progress logs, I would advise just to do it once, and to leave off the last argument (the egs). The aspects of that program that require the egs are relatively unimportant.

danpovey · 2016-09-21T01:55:08Z

egs/babel_multilang/s5/local/nnet3/run_tdnn_joint_babel_sp_bnf.sh

+echo print-interval = $print_interval
+if [ $stage -le 11 ]; then
+  echo "$0: training mutilingual model."
+  steps/nnet3/multi/train_tdnn.sh --cmd "$train_cmd" \


Agreed, please rename.

danpovey · 2016-09-21T01:56:13Z

egs/wsj/s5/steps/nnet3/multi/train_tdnn.sh

+use_ivector=false
+initial_effective_lrate=0.01
+final_effective_lrate=0.001
+pnorm_input_dim=3000


I believe this variable is unused; please check for other unused variables.

danpovey · 2016-09-21T02:13:24Z

src/nnet3bin/nnet3-merge-egs.cc

 // or crashes if it is not there.
 int32 NumOutputIndexes(const NnetExample &eg) {
  for (size_t i = 0; i < eg.io.size(); i++)
-    if (eg.io[i].name == "output")


Actually this is probably OK, I see that it is in a binary, not in nnet3/.

danpovey · 2016-09-21T02:20:28Z

src/nnet3bin/nnet3-merge-egs.cc

 // or crashes if it is not there.
 int32 NumOutputIndexes(const NnetExample &eg) {
  for (size_t i = 0; i < eg.io.size(); i++)
-    if (eg.io[i].name == "output")


The way this is set up right now, the training is going to be extremely inefficient because graph compilation will happen on every single minibatch. This is because the minibatches have randomized structure.
The way I think you should solve this is to modify nnet3-merge-egs so that it batches together inputs that have the same structure (where by "the same structure" I mean the set of input and output names). It may be best to write a class to do the merging. Also the order of the NnetIo objects in the eg is not necessarily well defined, and should be ignored, so bear that in mind.

danpovey · 2016-09-23T23:51:31Z

egs/wsj/s5/steps/nnet3/multilingual/extract_scp.sh

+  lang_id=${range[0]}
+  start_egs=${range[1]}
+  end_egs=$[$start_egs+${range[2]}]
+  awk -v s="$start_egs" -v e="$end_egs" 'NR >= s && NR < e' ${multi_egs_dirs[$lang_id]}/egs.scp >> $scp_file; 


This looks like a super-inefficient way to do it-- it would take time quadratic in the amount of input, not linear.

danpovey · 2016-09-23T23:53:38Z

egs/wsj/s5/steps/nnet3/multilingual/extract_scp.sh

+# in multilingual training. 
+
+if [ $# -lt 4 ]; then
+  echo "$0: Usage: $0 num-langs [<egs-dir-lang-1> .. <egs-dir-lang-n>] <ranges.1> <scp-1>"


this usage message is very unclear, it doesn't explain what the "ranges" or "scp" formats are, and things like <ranges.1> are not clear.
In any case, I think there might be a deeper design problem here, making it super slow.

danpovey · 2016-09-23T23:54:22Z

egs/wsj/s5/steps/nnet3/multilingual/get_egs.sh

+    if true; then
+      echo "$0: Generate egs for ${lang_list[$lang]}"
+      if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $egs_dir/storage ]; then
+        utils/create_split_dir.pl \


this create_split_dir stuff is not the kind of thing that should appear in scripts inside steps/.

danpovey · 2016-09-23T23:56:42Z

egs/wsj/s5/steps/nnet3/multilingual/get_egs.sh

+. parse_options.sh || exit 1;
+
+if [ $# -lt 4 ]; then
+  echo "Usage: $0 [opts] num-input-langs <data-dir-per-lang> <ali-dir-per-lang> <egs-dir-per-lang> <multilingual-egs-dir>"


I think it would be better to just take as input the egs dirs from all the languages, assuming they already exist; and move the responsibility for dumping those egs to a further-out level.

pegahgh · 2016-09-23T23:57:59Z

egs/wsj/s5/steps/nnet3/multilingual/get_egs.sh

@@ -0,0 +1,158 @@
+#!/bin/bash


@danpovey
I added steps/nnet3/multilingual/get_egs.sh, which generates egs..scp, outputs..scp, weights..scp for multilingual training using python file.
The details of egs..scp generation described in comments.
If you agree with whole pipeline for generating egs generation , I will start next steps, by doing following changes:

Modify nnet3-copy-egs to accept output and weight

add steps/nnet3/train_raw.py and modify it to be compatible with multilingual training setup.

modify all local/nnet3/run_* w.r.t new pipeline.

pegahgh · 2016-09-24T00:48:29Z

egs/wsj/s5/steps/nnet3/multilingual/allocate_multilingual_examples.py

@@ -0,0 +1,209 @@
+#!/usr/bin/env python


@danpovey
This python script reads lang2len as input and it randomly selects new language w.r.t probability of remaining examples in each language and it outputs ranges.* with format

and then the bash file get_egs.sh generates egs..scp w.r.t generated ranges. in parallel.
The reason that I didn't read scp in python was that I should load whole scp file for each language in memory, that is not efficient!
Should I change the process to stream the scp files in python line by line (batch of minibatch-size each time!!)?

danpovey · 2016-09-24T00:58:37Z

Doing it in parallel will just overwhelm the NFS server. Implement it in a
streaming way in python.
Dan

On Fri, Sep 23, 2016 at 8:48 PM, pegahgh [email protected] wrote:

@pegahgh commented on this pull request.

In egs/wsj/s5/steps/nnet3/multilingual/allocate_multilingual_examples.py
#1027 (review):

@@ -0,0 +1,209 @@
+#!/usr/bin/env python

@danpovey https://github.com/danpovey
This python script reads lang2len as input and it randomly selects new
language w.r.t probability of remaining examples in each language and it
outputs ranges.* with format

and then the bash file get_egs.sh generates egs..scp w.r.t generated
ranges. in parallel.
The reason that I didn't read scp in python was that I should load whole
scp file for each language in memory, that is not efficient!
Should I change the process to stream the scp files in python line by line
(batch of minibatch-size each time!!)?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1027 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu2vgVf124jkBaHrPiTz1LAdAvJ62ks5qtHNggaJpZM4J9Vpw
.

pegahgh · 2016-10-03T16:14:27Z

@vijayaditya
I added new multilingual setup, which manages multilingual examples in more IO efficient way.

vijayaditya · 2016-10-03T16:34:09Z

OK will review today.

On Mon, Oct 3, 2016 at 12:14 PM, pegahgh [email protected] wrote:

@vijayaditya https://github.com/vijayaditya
I added new multilingual setup, which manages multilingual examples in
more IO efficient way.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1027 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoMpcCAeBuDE3TNB4CAJZwmDJFDqwks5qwSnmgaJpZM4J9Vpw
.

vijayaditya

started the review. Will continue later..

vijayaditya · 2016-10-05T05:48:19Z

egs/babel_multilang/s5/conf/common.fullLP

@@ -0,0 +1 @@
+../../../babel/s5c/conf/common.fullLP


We don't encourage soft-linking files across egs, as the source recipes can change anytime.

vijayaditya · 2016-10-05T06:00:51Z

egs/babel_multilang/s5/local/nnet3/prepare_multilingual_egs.sh

+
+  exit 1;
+fi
+cmd=run.pl


why are you resetting cmd ?

vijayaditya · 2016-10-05T06:02:21Z

egs/babel_multilang/s5/local/nnet3/prepare_multilingual_egs.sh

+if [ $# -lt 4 ]; then
+  echo "Usage: $0 [opts] num-input-langs <data-dir-per-lang> <ali-dir-per-lang> <egs-dir-per-lang> <multilingual-egs-dir>"
+  echo " e.g.: $0 2 data/lang1/train data/lang2/train "
+       " exp/lang1/tri5_ali exp/lang2/tri5_ali exp/lang1/nnet3/lang1 exp/lang2/nnet3/lang2 exp/multi/egs"


exp/lang1/nnet3/lang1 Is this supposed to be exp/lang1/nnet3/egs ?

vijayaditya · 2016-10-05T06:04:25Z

egs/babel_multilang/s5/local/nnet3/prepare_multilingual_egs.sh

+stage=0
+left_context=13
+right_context=9
+online_multi_ivector_dir=     # list of iVector dir for all languages


Give an example for this variable.

vijayaditya · 2016-10-05T06:05:40Z

egs/babel_multilang/s5/local/nnet3/prepare_multilingual_egs.sh

+  ali_dir=${multi_ali_dirs[$lang]}
+  egs_dir=${multi_egs_dirs[$lang]}
+  online_ivector_dir=
+  if [ ! -z "$multi_ivector_dirs" ]; then


Variable name mismatch.

vijayaditya · 2016-10-05T06:12:07Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+    for datadir in train; do
+      ./utils/data/perturb_data_dir_speed_3way.sh data/$lang/${datadir} data/$lang/${datadir}_sp
+
+      # Extract Plp+pitch feature for perturbed data.


this feature extraction needs to be done only if you want to generate alignments. Or are you assuming some other top level script needs these features ?

actually this feature extraction is inside speed-perturb condition, where you need to regenerate alignment for perturbed data.

vijayaditya · 2016-10-05T06:13:09Z

egs/babel_multilang/s5/local/nnet3/run_common_langs.sh

+    utils/create_split_dir.pl /export/b0{1,2,3,4}/$USER/kaldi-data/egs/$lang-$date/s5c/$mfccdir/storage $mfccdir/storage
+  fi
+
+  # the 100k_nodup directory is copied seperately, as


These comments are irrelevant.

vijayaditya · 2016-10-05T06:17:26Z

egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh

@@ -0,0 +1,48 @@
+#!/bin/bash
+# This scripts generates iVector using global iVector extractor


usually run_ivector_common*.sh also train the ivector extractor. Please rename this script.

vijayaditya · 2016-10-05T06:18:29Z

egs/babel_multilang/s5/local/nnet3/run_ivector_common_langs.sh

+
+mkdir -p nnet3
+# perturbed data preparation
+train_set=train


couldn't you just take these variables as input ?

vijayaditya · 2016-10-05T06:19:50Z

egs/babel_multilang/s5/local/nnet3/run_multilingual_bnf.sh

+speed_perturb=true
+multidir=exp/nnet3/multi_bnf_10_close_lang_plus_grg
+global_extractor=exp/multi/nnet3/extractor
+lang_list_for_grg=(GRG LIT MONG TUR KAZ KUR PSH SWA TOK IGBO DHO)


better describe the language codes, not everyone is aware of this.

Instead of inventing your own codes, I'd suggest using the ids from babel, which is an established practice... i.e 404-georgian, 304-lithuanian and so on.
It does not help anything if you make the codes short, you only obfuscate things

vijayaditya

Reviewed a few more files. Will continue again when I can find time.

vijayaditya · 2016-10-06T01:01:50Z

egs/babel_multilang/s5/local/nnet3/run_multilingual_bnf.sh

+
+
+[ ! -d $dump_bnf_dir ] && mkdir -p $dump_bnf_dir
+if [ ! -f $data_bnf_dir/.done ]; then


As I requested in the previous review, whenever you are skipping a stage as it is already done, print a statement saying that you are skipping it and also add a comment describing how users can force it to run ie, by deleting the .done files.

vijayaditya · 2016-10-08T01:23:00Z

egs/wsj/s5/steps/nnet3/train_raw_dnn.py

@@ -0,0 +1,698 @@
+#!/usr/bin/env python


Do you want to add this script as part of this PR ? Or do you want to assume that Vimal's PR is going to be merged before your PR ?

I prefer to assume Vimal's PR is merged before!

vijayaditya · 2016-10-08T01:26:43Z

egs/wsj/s5/steps/nnet3/multilingual/make_tdnn_configs.py

+            }
+
+# The function signature of MakeConfigs is changed frequently as it is intended for local use in this script.
+def MakeConfigs(config_dir, splice_indexes_string,


Better to move the common parts of tdnn/make_configs.py and tdnn/multi_lingual/make_configs.py (e.g. Splice string parsing, CNN specification, common argument parsing) and include it into both the scripts. See how Vimal is doing this in his PR #1066 .

vijayaditya · 2016-10-08T01:28:39Z

egs/wsj/s5/steps/nnet3/multilingual/get_egs.sh

+# This script uses separate input egs directory for each language as input, 
+# to generate egs.*.scp files in multilingual egs directory
+# where the scp line points to the original archive for each egs directory.
+# $megs/egs.*.scp is randomized w.r.t language id.


Adding snippets of these scp files would be helpful.

vijayaditya · 2016-10-08T01:29:55Z

egs/wsj/s5/steps/nnet3/multilingual/get_egs.sh

+# Begin configuration section.
+cmd=run.pl
+minibatch_size=512      # multiple of minibatch used during training.
+num_jobs=10             # helps for better randomness across languages


Did you forget to update the comment beside this variable ?

vijayaditya · 2016-10-08T02:21:48Z

egs/wsj/s5/steps/nnet3/multilingual/allocate_multilingual_examples.py

+          # they are discarded.
+          if lang_len[lang_id] < args.minibatch_size:
+            lang_len[lang_id] = 0
+            print("Run out of data for language {0}".format(lang_id))


Use warnings.warn to print this statement. Ran out of data..

This is not really warning!! I will reword the statement!

vijayaditya · 2016-10-08T02:22:06Z

egs/wsj/s5/steps/nnet3/multilingual/allocate_multilingual_examples.py

+
+    # check files befor writing.
+    if f is None:
+      sys.exit("Error opening file " + args.egs_dir + "/temp/" + args.prefix + "ranges." + str(job + 1))


prefer usingraise Exception

vijayaditya · 2016-10-08T02:24:12Z

egs/wsj/s5/steps/nnet3/make_bottleneck_features.sh

+feats="ark,s,cs:apply-cmvn $cmvn_opts --utt2spk=ark:$sdata/JOB/utt2spk scp:$sdata/JOB/cmvn.scp scp:$sdata/JOB/feats.scp ark:- |"
+ivec_feats="scp:utils/filter_scp.pl $sdata/JOB/utt2spk $ivector_dir/ivector_online.scp |"
+
+if [ ! -z "$transform_dir" ]; then


We haven't been using fMLLR features in nnet3, even the new training scripts do not support this option. I am eliminating this option in the new get_egs.py script, so you could safely eliminate this code and simplify your script.

vijayaditya · 2016-10-08T02:26:58Z

egs/wsj/s5/steps/nnet3/make_bottleneck_features.sh

+nj=4
+cmd=run.pl
+use_gpu=false
+bnf_name=Tdnn_Bottleneck_renorm


As I pointed out in my previous review, it is better not to use bnf_name as an optional argument. Please consider making it compulsory.

The whole point is that this script can be used to dump output of different layers not only bottleneck. That was the reason I didn't fix the name!

vijayaditya · 2016-10-08T02:29:07Z

egs/wsj/s5/steps/nnet3/make_bottleneck_features.sh

+    ivector_opts="--online-ivector-period=$ivec_period --online-ivectors='$ivec_feats'"
+  fi
+  $cmd $compute_queue_opt JOB=1:$nj $dir/log/make_bnf_$name.JOB.log \
+    nnet3-compute $compute_gpu_opt $ivector_opts "$modified_bnf_nnet" "$feats" ark:- \| \


I think you might want to make your bnf extractor architecture agnostic. If you are using an RNN you should support specification of extra-left-context, extra-right-context and chunk width.

danpovey · 2016-10-13T19:28:00Z

node_name might be a better name than bnf_name.

On Thu, Oct 13, 2016 at 3:15 PM, pegahgh [email protected] wrote:

@pegahgh commented on this pull request.

In egs/wsj/s5/steps/nnet3/make_bottleneck_features.sh
#1027:

@@ -0,0 +1,128 @@
+#!/bin/bash
+
+# 2016 Pegah Ghahremani
+# Apache 2.0
+# This script dumps bottleneck feature for model trained using nnet3.
+
+# Begin configuration section.
+stage=1
+nj=4
+cmd=run.pl
+use_gpu=false
+bnf_name=Tdnn_Bottleneck_renorm

The whole point is that this script can be used to dump output of
different layers not only bottleneck. That was the reason I didn't fix the
name!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1027, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu4K7qGTPhy3cWVBZZcSGqy6JcV9gks5qzoNOgaJpZM4J9Vpw
.

…and modified some codes and binaries for multilingual setup.

… program.

vijayaditya reviewed Sep 15, 2016

View reviewed changes

vijayaditya mentioned this pull request Sep 16, 2016

multi-lang training scripts for nnet3 #661

Closed

vijayaditya reviewed Sep 18, 2016

View reviewed changes

danpovey requested changes Sep 18, 2016

View reviewed changes

vijayaditya reviewed Sep 19, 2016

View reviewed changes

vijayaditya requested changes Sep 19, 2016

View reviewed changes

danpovey changed the title ~~Added babel_multilang example directory for multilingual setup using babel languages.~~ WIP: Added babel_multilang example directory for multilingual setup using babel languages. Sep 21, 2016

danpovey reviewed Sep 23, 2016

View reviewed changes

pegahgh commented Sep 23, 2016

View reviewed changes

pegahgh commented Sep 24, 2016

View reviewed changes

vimalmanohar added 7 commits September 27, 2016 01:07

raw_python_script: Adding raw nnet training

bbdfeaf

raw_python_script: Raw LSTM config

4c060c3

raw-signal-v2: Adding steps/nnet3/tdnn/make_raw_configs.py

185e031

raw_python_script: Made raw and AM nnets training and configs similar

851eb24

raw_python_script: tdnn make_configs.py with support for raw nnet3

23aa55c

raw_python_script: Refactoring DNN training

d074e56

raw_python_script: Minor bug fixes

14db046

pegahgh force-pushed the multilingual branch from 8652655 to 3c6c1b5 Compare September 30, 2016 18:50

raw_python_script: Refactoring RNN and DNN scripts

0782aab

Merging from master

0712a32

vijayaditya reviewed Oct 5, 2016

View reviewed changes

vimalmanohar added 2 commits October 5, 2016 23:08

raw_python_script: Addressed comments and made changes

5b17a4c

raw_python_script: Missed variable renames

f73183f

vijayaditya requested changes Oct 8, 2016

View reviewed changes

raw_python_script: Changing module imports

167d909

pegahgh added 5 commits October 14, 2016 20:03

added babel_multilang example dir for multilingual setting and added …

bb8a6db

…and modified some codes and binaries for multilingual setup.

fixed small issue.

0ebdb97

small fix.

db042bb

fixed issues with raw_configs.

dce56c8

fixed incompatibility issues.

8dd3035

pegahgh force-pushed the multilingual branch from e5091b3 to 8dd3035 Compare October 17, 2016 16:00

pegahgh added 4 commits October 17, 2016 14:24

fixed some old comments removed during rabase.

5403426

added new prepare_lang_conf.sh with lang name as being named in Babel…

2a2b761

… program.

fixed small issues.

fc5d62c

fixed small typos.

9485ffc

danpovey closed this May 24, 2017


		utils/data/perturb_data_dir_volume.sh $data_dir \|\| exit 1 ;

		steps/make_mfcc.sh --nj 70 --mfcc-config conf/mfcc_hires.conf \

		@@ -0,0 +1,241 @@
		#!/bin/bash

		# This is a crosslingual training setup where there are no shared phones.

		@@ -0,0 +1,119 @@
		#!/bin/bash
		# -v3 is as -v2 but it just consists 10 closest fullLP langs to GRG + GRG.


		return args

		def AddPerDimAffineLayer(config_lines, name, input, input_window):

		@@ -104,10 +104,16 @@ def GetArgs():
		parser.add_argument("--relu-dim", type=int,

		@@ -0,0 +1 @@
		../../../babel/s5c/conf/common.fullLP No newline at end of file

		@@ -0,0 +1,48 @@
		#!/bin/bash
		# This scripts generates iVector using global iVector extractor



		[ ! -d $dump_bnf_dir ] && mkdir -p $dump_bnf_dir
		if [ ! -f $data_bnf_dir/.done ]; then

		. ./utils/parse_options.sh


		L=$1


		. ./utils/parse_options.sh

		L=$1

		. ./utils/parse_options.sh


		L=$1

Conversation

pegahgh commented Sep 14, 2016

Uh oh!

danpovey commented Sep 15, 2016

Uh oh!

vijayaditya commented Sep 15, 2016

Uh oh!

vijayaditya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vijayaditya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vijayaditya Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

vijayaditya Sep 15, 2016 •

edited

Loading