Skip to content

AMI TDNN Results Update#1545

Merged
danpovey merged 6 commits intokaldi-asr:masterfrom
david-ryan-snyder:ivector-pca2
Apr 14, 2017
Merged

AMI TDNN Results Update#1545
danpovey merged 6 commits intokaldi-asr:masterfrom
david-ryan-snyder:ivector-pca2

Conversation

@david-ryan-snyder
Copy link
Contributor

@david-ryan-snyder david-ryan-snyder commented Apr 14, 2017

This is a continuation of Issue #1123 (which can't be closed yet). The main point of this is to update more of the results as we continue to switch to using PCA (instead of LDA) for the ivector transform.

  • Update to the TDNN results in RESULTS_{ihm,sdm,mdm}. [All of the results improve slightly.]

  • Change to egs/ami/s5b/local/chain/tuning/run_tdnn_1{a,b,c,d}.sh so that the file $dir/egs/.nodelete is not created by default. User should now run with --cleanup false to prevent the egs from being deleted after training.

  • Bug fix in egs/ami/s5b/run.sh. Previously, if BeamformIt was not installed, the user was instructed to go to ../../../tools/ and run make beamformit. However, this rule doesn't exist in the Makefile. I replaced the instructions with ../../../tools/; extras/install_beamformit.sh; cd -. [Alternatively, we could add just add a rule to the Makefile to do that, but I didn't see many rules involving extras/ in the Makefile.]

David Snyder and others added 5 commits April 1, 2017 21:03
…ectors used in ASR. Results are reported in the default TDNN recipe in AMI. Updating steps/online/nnet2/{train_diag_ubm.sh,train_ivector_extractor.sh} so that they now backup the contents of their destination directory if it already exists.
… (tdnn1d). Fixing minor bug in egs/ami/s5b/local/chain/tuning/run_tdnn_*.sh scripts.
fi

touch $dir/egs/.nodelete # keep egs around when that run dies.
mkdir -p $dir/egs && touch $dir/egs/.nodelete # keep egs around when that run dies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, would you mind just deleting these lines? The 'proper' way now to stop the egs being removed is to run with --cleanup false. Anyway it isn't necessarily a good default to keep the egs around.

@danpovey
Copy link
Contributor

@david-ryan-snyder (and this isn't specifically about this commit but about your changes to how the iVector extraction is done in the example scripts),
What kind of WER differences, if any, are you seeing by changing LDA->PCA for the i-vector space?
What I am wondering is whether they are substantial enough that we should ask people like @GaofengCheng and @tomkocse who may be adding new results in the near future, to run the script from the start so that iVectors get re-computed.

@david-ryan-snyder
Copy link
Contributor Author

david-ryan-snyder commented Apr 14, 2017

@danpovey,

I don't think they need to. The difference is usually very small (or no difference) and the recipes they are working on are probably still using LDA by default anyway.

At the moment, the only recipe that uses PCA instead of LDA features by default is egs/ami/s5b/local/chain/run_tdnn.sh . All of recipes in AMI, and all other examples continue to use the LDA transform as before (at least until we update them, which we are in the process of).

An apples-to-apples comparison in AMI ihm gives a WER of 22.0/22.2 for PCA and 21.9/22.3 for LDA (dev/eval). So they're basically the same. Looking at some of my older results for this (which did not get committed yet), it was slightly worse for Tedlium (a difference of about 0.08% absolute), and slightly better on WSJ.

@danpovey danpovey merged commit 8891750 into kaldi-asr:master Apr 14, 2017
@david-ryan-snyder
Copy link
Contributor Author

@danpovey, also I removed the "$dir/egs/.nodelete" line.

@david-ryan-snyder
Copy link
Contributor Author

Thanks!

@danpovey
Copy link
Contributor

OK, merged.
My feeling is it's probably best to be consistent and do this everywhere-- the results changes are probably just noise.

@david-ryan-snyder
Copy link
Contributor Author

david-ryan-snyder commented Apr 14, 2017

@danpovey,

I can make this the default wherever possible. But, I think it will still require targeted updates for each recipe, since different recipes use different ivector extractor training scripts. So, there's an aim for consistency, but I think in the short term there will have to settle for some discrepancies...

The problem with making all of the changes simultaneously in one PR is that by the time I test all the recipes, there will be updates to at least a few of them that make the new changes out of date...

@danpovey
Copy link
Contributor

danpovey commented Apr 14, 2017 via email

@david-ryan-snyder
Copy link
Contributor Author

Oh, got it. Yeah, that's the plan.

kronos-cm added a commit to kronos-cm/kaldi that referenced this pull request Apr 28, 2017
* 'master' of https://github.com/kaldi-asr/kaldi: (21 commits)
  [egs] bug-fix in egs/ami/s5/run_ihm.sh (kaldi-asr#1577)
  [src] Minor bug-fixes in compute-wer-bootci and WSJ run.sh.  Thanks: @osadjadi
  [egs] Add soft link for mini-librispeech setup
  [egs] adding results and cleanup in mini-librispeech
  [egs] Add mini-librispeech example scripts [intended as a sanity-checker/tutorial setup] (kaldi-asr#1566)
  [src] Fix to testing code signal-test.cc, change threshold to resolve failure (kaldi-asr#1565)
  [src] Add documentation for dropout function.
  [src,scripts,egs]  Add dropout for nnet3 LSTMs, with recipes. (kaldi-asr#1537)
  [src] nnet3 online silence weighting - adding frame subsampling factor (kaldi-asr#1559)
  [doc] Small edit to hmm.dox, clarifying something
  [egs] Added check for kaldi_lm being installed in fisher_swbd recipe. (kaldi-asr#1558)
  Update travis.yml so PRs to kaldi_52 are built
  [srcipts] steps/nnet3/report/generate_plots.py: plot 5,50,95th percentile of value and derivative instead of mean+-stddev (kaldi-asr#1472)
  [egs] AMI TDNN Results Update (kaldi-asr#1545)
  [src] add template instantiations for ConvertStringToReal, address issue kaldi-asr#1544
  [egs,scripts,src] SID and LID tools and scripts: cosmetic improvements, better error-handling, and various minor fixes; results unchanged. (kaldi-asr#1543)
  [src] Change ConvertStringToReal to be locale-independent (i.e. always-US).  Fixes android issue. (kaldi-asr#1513)
  [scripts] nnet3 : fix issue where LDA estimation failed for LSTMs with label delay (kaldi-asr#1540)
  [scripts] fix to get_egs_targets.sh (thanks: David Pye)
  [src] Fix copy-feats for using the --write-num-frames and --compress true flags at the same time (kaldi-asr#1541)
  ...
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
* [scripts,egs] Adding options for using PCA instead of LDA+MLLT for ivectors used in ASR. Results are reported in the default TDNN recipe in AMI. Updating steps/online/nnet2/{train_diag_ubm.sh,train_ivector_extractor.sh} so that they now backup the contents of their destination directory if it already exists.

* [egs,scripts] Updating AMI TDNN results to reflect the current recipe (tdnn1d). Fixing minor bug in egs/ami/s5b/local/chain/tuning/run_tdnn_*.sh scripts.

* [egs] Updating chain scripts in AMI so that they do not default to keeping egs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants