Skip to content

Conversation

@dresen
Copy link
Contributor

@dresen dresen commented Feb 7, 2017

No description provided.

Merging to resolve conflict
…nd run_ivector_common.sh to this setup. Achieves new state-of-the-art on dev set (~11% WER).
… and added the --tries 100 flag to wget in sprak_data_prep.sh so the download does not break due to broken connections
… scripts and removed python3 check+install from sprak_data_prep.sh because the script is compatible with python2.
@danpovey
Copy link
Contributor

danpovey commented Feb 7, 2017 via email

@galv
Copy link
Contributor

galv commented Feb 8, 2017

I noticed that you have a lot of early commits that merge the kaldi master branch. I recommend you get rid of those. In the future you can rebase your work on top of the kaldi master branch. But for now, you can cherry-pick your commits on top of a fresh master branch.

i.e., if you make a branch my-branc which is a copy of master:

git cherry-pick dbca511
git cherry-pick d3d4e41

and so on. git will tell you if there are conflicts but I doubt this will be the case for a new recipe.

@danpovey
Copy link
Contributor

danpovey commented Feb 8, 2017 via email

Copy link
Contributor

@galv galv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with grep that all the removed files are indeed unused files from a long time ago, so that part looks good to me.

I haven't reviewed anything inside the tuning/ directory yet.

%WER 20.82 [ 2251 / 10811, 399 ins, 471 del, 1381 sub ] exp/tri4b/decode_3g_test1k.si/wer_13
%WER 17.53 [ 1895 / 10811, 403 ins, 375 del, 1117 sub ] exp/tri4b/decode_4g_test1k/wer_13
%WER 20.99 [ 2269 / 10811, 438 ins, 436 del, 1395 sub ] exp/tri4b/decode_4g_test1k.si/wer_11
%WER 22.87 [ 24286 / 106172, 3577 ins, 5321 del, 15388 sub ] exp/tri1/decode_fg_dev/wer_12_0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you removed the results of existing recipes. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those results are based on a subset of the whole test set that was used to do "fast testing". This ended up being more of a development set which is not acceptable for a subset of the test set, so I redid all GMM experiments with tuning on the actual dev set and rather than test1k and proceeded to the same for nnet3 and chain recipes.

@@ -0,0 +1 @@
# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local/run_online_decoding.sh does not exist for this recipe. Maybe you accidentally copied this file from tedlium.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is unused, delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the next thing I want to work on.

@@ -0,0 +1,64 @@
#!/bin/bash

echo $0 $*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to document what arguments are appropriate for $. Looks like it's simply the names of the exp/ directories you want to compare. You should print this help message out if $ is empty.

prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -w xent | awk '{printf("%.4f", $8)}')
printf "% 10s" $prob
done
echo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing about this file specific to the sprakbanken dataset. It should therefore belong in utils/ instead. Probably the best place to put it is utils/nnet3 but that doesn't exist, so maybe steps/nnet3 is a better place (Especially since there are other "utility"-like scripts there, like the reporting scripts and nnet3_to_dot.sh

Another maintainer should comment on whether this script is redundant with some other script and whether it should be included.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this script may not have been properly updated to reflect the characteristics of the Sprakbanken database. It is supposed to print out a little table for comparing WER and objective values between different training runs. But this seems to be a copy of some other script. It does contain some corpus-specific things-- the set of names of decoded directories, and what type of scoring they have (Kaldi vs. sclite).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fully new script according to git, so it's either newly created or copy-pasted from elsewhere, actually. It clearly works for @dresen if it's here now. Regardless, if it's not robust and clearly-documented, my guess is that we shouldn't keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten to remove this script. I copied it over from tedlium beecause I wanted to get it to work, but as I understand score_sclite.sh, I need a 'glm' or 'slm' file, which is not part of the språkbanken data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could easily get it to work, compare with the Switchboard setup where for the dev set it uses Kaldi scoring and not sclite scoring. But either get it to work or remove it.

echo $0 $*

echo -n "System "
for x in $*; do printf "% 10s" " $(basename $x)"; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change $* to "$@"

It's important in case someone has an experimental directory that contains a space. That's probably never going to happen, but you never know.

@@ -0,0 +1 @@
tuning/run_tdnn_1b.sh No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter, but best to use #!/bin/bash at the top here. Same for run_tdnn_lstm.sh

@@ -0,0 +1 @@
tuning/run_tdnn_1b.sh No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you have run_tdnn_lstm.sh and run_tdnn.sh, but you're missing local/chain/run_lstm.sh Any reason why you don't have the last file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't in the tedlium folder I copied, but I'll add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no need to have the 'just-lstm' number, generally TDNN+LSTM will work better.

# note, if you have already run the corresponding non-chain nnet3 system
# (local/nnet3/run_tdnn.sh), you may want to run with --stage 14.

set -e -o pipefail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey @vijayaditya Do we consider these settings okay? I feel like I remember being asked to remove these in one of my earlier pull requests, but I'm not sure. At the every least, I think I was asked to remove -e because you all prefer being explicit with || exit 1 clauses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current situation is many scripts do have the 'set -e', and many do not. Officially, either is OK right now. It's not an ideal situation but that's the way it is.

#local/sprak_run_sgmm2.sh dev

# Run neural network setups based in the TEDLIUM recipe
if [ $stage -le 11 ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey Is it typical to include nnet3 recipes inside run.sh? I feel that leaving in the lstm recipes is not good since those have the potential to run for a very long time without someone realizing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I'm not sure. I'd probably prefer to include the command (or commands) but commented out, with a comment saying that this is the command to run [such-and-such.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll comment it out

@danpovey
Copy link
Contributor

danpovey commented Feb 8, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Feb 9, 2017

Looks good. I'll merge and if any issues are found in future we'll address them then.

@danpovey danpovey merged commit bcc71b6 into kaldi-asr:master Feb 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants