Skip to content

Conversation

@wonkyuml
Copy link
Contributor

  • It's using 51 hrs Korean audio data and pre-trained lm(using external text corpus)
  • It contains the recent TDNN training recipe.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like you haven't done much quality control.

@@ -0,0 +1,10 @@
# Default configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't include this, this is somethign specific to your queue.

@@ -0,0 +1,299 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names should start from 1a.

@@ -0,0 +1,302 @@
#!/bin/bash

set -e -o pipefail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again- names should be consecutive starting from 1a.n And you should include results in a comment at the top of the script, preferably the output of some script like local/chain/compare_wer.sh.

# Check list before start
# 1. locale setup
# 2. pre-installed package: awscli, Morfessor-2.0.1, flac, sox, same cuda library, unzip
# 3. pre-install or symbolic link for easy going: rirs_noises.zip (takes pretty long time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again--very vague.

# 1. locale setup
# 2. pre-installed package: awscli, Morfessor-2.0.1, flac, sox, same cuda library, unzip
# 3. pre-install or symbolic link for easy going: rirs_noises.zip (takes pretty long time)
# 4. parameters: nCPU, num_jobs_initial, num_jobs_final, --max-noises-per-minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this means.

# it takes long time and do this again after computing silence prob.
# you can do comment out here this time

#utils/build_const_arpa_lm.sh data/local/lm/zeroth.lm.tg.arpa.gz \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is part of the script it shouldn't be commented out. You can add a --state option in the run.sh to enable running after partial run.

data/$test exp/tri4b/decode_{tgsmall,fglarge}_$test
done

# align train_clean_100 using the tri4b model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is outdated

# --splice-indexes "layer0/-1:0:1 layer1/-2:1 layer2/-4:2" \
# --num-hidden-layers 3 \
# --splice-indexes "layer0/-4:-3:-2:-1:0:1:2:3:4 layer2/-5:-1:3" \
steps/nnet2/train_multisplice_accel2.sh --stage $train_stage \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why you'd include old nnet2 recipes in this PR. The nnet3 recipes must be substantially better by now. In any case you should be including results.

data/lang exp/nnet2_online/extractor "$dir" ${dir}_online || exit 1;
fi

#if [ $stage -le 11 ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of commented stuff here. It feels like you have just added whatever was lying around in a directory to a PR, without really checking it.

fi
cp $trans $trans".old"
awk '{print $1}' $trans".old" > $trans"_tmp_index"
cut -d' ' -f2- $trans".old" |\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before using external scrips like morfessor you should add a script in somewhere like tools/extras/ that can automatically install it, and make it so your script can use it from that location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new pull request for morfessor installation script, Dan.
plz check the PR
#2299

@danpovey
Copy link
Contributor

danpovey commented Jul 3, 2018

Is this ready to merge, as far as you know?

wonkyuml added 3 commits July 9, 2018 20:01
- simplified the script
- delete unnecessary scripts and comments
@danpovey
Copy link
Contributor

Let me know when you think this is ready to review again.

@wonkyuml
Copy link
Contributor Author

Ok. It's now ready to review.

linear-component name=tdnn5l dim=256 $linear_opts
relu-batchnorm-layer name=tdnn5 $opts dim=1280 input=Append(tdnn5l, tdnn3l)
linear-component name=tdnn6l dim=256 $linear_opts input=Append(-3,0)
relu-batchnorm-layer name=tdnn6 $opts input=Append(0,3) dim=1280
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model is probably too large, given that you only have 51 hours of data.
Can you please try to base your recipe on the current TDNN recipe from WSJ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed recipe based on the current WSJ TDNN-F recipe. It uses much smaller parameter as you suggested. It gets better, which is good.

@wonkyuml
Copy link
Contributor Author

wonkyuml commented Jul 11, 2018 via email

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

@@ -0,0 +1,277 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These recipes should have results at the top in a comment. Please create a script like compare_wer.sh that prints out the results and diagnostics; you can copy and modify it from another setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tdnn and tdnn_opgru recipe has results and diagnostics now. Thanks for suggestion. Also, I created compare_wer.sh for this.

@@ -0,0 +1,63 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a file s5/README.txt that explains something about the data: what type of data, how much of it, how you can obtain it, what the license is, things like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added README.txt

chunk_width=150,110,100

# training options
num_jobs_initial=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from remove_egs, I prefer that you put the rest of these (num-jobs, num-epochs, minibatch-size, learning rates), directly as args to the script; there's no need for variables. The same goes for chunk_width and max_param_change and xent_regularize (xent_regularize should probably be zero).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. I took those parameters out from variables. xent_regularize was 0.1 for the script that I referred. Do you still think zero makes sense for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry I was mixing it up. xent_regularize should be 0.1. It's chain-regularize that we are now making zero (in setups with l2).

num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
opts="l2-regularize=0.002"
linear_opts="orthonormal-constraint=1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you adjust the recipe, be sure to take all of these variables from where you are copying the recipe from. And don't forget the dropout schedule and the corresponding option to train.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I try to make sure that I copy all of variables from the baseline recipe. Now TDNN has dropout schedule.

--chain.lm-opts="--num-extra-lm-states=2000" \
--trainer.max-param-change $max_param_change \
--trainer.num-epochs $num_epochs \
--trainer.frames-per-iter 1500000 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest 2 million for the frames-per-iter; the 5 million in the WSJ example is probably too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I based TDNN-F recipe and changed frames-per-iter to 2m.


mkdir exp -p exp/nnet3

steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern scripts use PCA instead of LDA+MLLT. It's a substantial simplification and doesn't affect the results. Perhaps you copied this script from somewhere not fully up-to-date, like tedlium s5_r2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it

awk '{for (i=2; i<=NF; ++i) { print $i; gsub(/[0-9]/, "", $i); print $i}}' $lexicon_raw_nosil |\
sort -u |\
perl -e 'while(<>){
chop; m:^([^\d]+)(\d*)$: || die "Bad phone $_";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please indent these a bit. they read like bash commands if they start at the very left.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation done

@@ -0,0 +1,36 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please call this update_segmentation.sh, for consistency in naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it

export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=ko_KR.UTF-8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem. LC_ALL should always be set to C. Most people may not even have the Korean locale installed.
Why is this even needed?
If you have scripts that depend on the locale, you should change them so they don't. With python3, it's possible to rework the I/O so that it treats input streams using a certain encoder/decoder (such as "utf-8").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. @jty016 will think about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this dependency by using morfessor's option -e


if [ $stage -le 11 ]; then

echo "#### SAT again on train_clean ###########"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all echo statements should start with "$0:" so it's clear which script produced the output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now all echo statements start with "$0"

@wonkyuml
Copy link
Contributor Author

It still needs more work. I will ping you when it's fully ready to review again. Thanks for review.

@lucas-jo
Copy link
Contributor

@wonkyuml Sorry for late update. plz check if it is okay now

@wonkyuml
Copy link
Contributor Author

Thanks! I will review this week.

@wonkyuml
Copy link
Contributor Author

@danpovey could you review this again?

@danpovey
Copy link
Contributor

OK, will try to do it to-morrow

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It looks very good generally, there are some very small issues. The better-tuned TDNN-F experiments are not necessary for me to merge it though, you can do it later if you have time.

done
exit 0

# tdnn_1a is a kind of factorized TDNN, with skip connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These systems seem to be underfitting-- I don't see any train/valid difference. Therefore you might want to choose a larger config for this TDNN-F system. (the one with OPGRU is already quite large but you could try more epochs).


# the first splicing is moved before the lda layer, so no splicing here
relu-batchnorm-dropout-layer name=tdnn1 $tdnn_opts dim=1024
tdnnf-layer name=tdnnf2 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were you I would try a slightly larger system here... e.g. increase 1024 -> 1280, 192->256 and 128->160. It's not necessary for it to be merged though; you can do it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I am training large system. If large system is better than this, I will update result and recipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested, increasing parameter size helped! I am trying to increase a bit more.

exit 1
fi

spk_file=$src/../AUDIO_INFO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't very much like how you are going one level up from "$src". If one-level-up is the real source, you should give that directory. But make sure that entire directory is downloaded from one place. If it's multiple downloads then make them 2 separate arguments to this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole data is downloaded in one place. It was trying to refer meta data while preparing data directory. Anyway, I fixed it.

test=${src_dir}_test_${lm_suffix}
mkdir -p $test
cp -r ${src_dir}/* $test
gunzip -c $lm_dir/zeroth.lm.${lm_suffix}.arpa.gz | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you pruned one of these a bit too aggressively-- what is the size of the graph? The pre/post-rescore WERs here differ more than I would normally expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HCLG.fst is not that small. 676M for chain graph and 1.8G for tri-phone GMM system. Rescoring LM is even bigger though. Actually vocab is pretty huge 500k as Korean is agglutinative language.

steps/compute_cmvn_stats.sh data/${datadir}_hires || exit 1;
done

# We need to build a small system just because we need the LDA+MLLT transform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments are out of date, we are using PCA not LDA+MLLT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I corrected it.

@danpovey
Copy link
Contributor

@hhadian... since Korean is an agglutinative language (and I believe it's phonetically spelled), I wonder whether a BPE-based recipe might do much better on this. I don't recall right now the steps required to do this. I wonder if you could advise what steps might be required to test this out? Have we done this on ASR recipes yet, or only on OCR?

@danpovey
Copy link
Contributor

also @DongjiGao or @hainan-xv might have some experience with this.

@wonkyuml
Copy link
Contributor Author

Yes, it's phonetically spelled although pronunciation slightly changes depending on the right before character. Any advice would be appreciated!

@hhadian
Copy link
Contributor

hhadian commented Aug 31, 2018

No we have not yet tried it on ASR.
There are basically 4 steps:

  • train the BPE model (as in stage 2 of egs/iam/v2/run_end2end.sh)
  • convert the training (and optionally test) transcriptions to BPE (also as in stage 2 of egs/iam/v2/run_end2end.sh)
  • prepare the lang and add final optional silence as in stage 4 of egs/iam/v2/run_end2end.sh
  • add a proper local/wer_hyp_filter file (or a local/wer_output_filter if you also converted test transcriptions) so that scoring works correctly. See egs/iam/v2/local/wer_output_filter.

@wonkyuml
Copy link
Contributor Author

Thanks @hhadian I will try BPE later. will create separate PR if it works.

@wonkyuml
Copy link
Contributor Author

TDNN-F looks good as of now.

@danpovey It you're ok, it's ready to be merged.

@danpovey danpovey merged commit 66145ea into kaldi-asr:master Aug 31, 2018
@danpovey
Copy link
Contributor

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants