Skip to content

Conversation

@jyhnnhyj
Copy link
Contributor

@danpovey
Copy link
Contributor

OK, but you need to add the script that it points to as well. I'll wait till you have WERs for that, though.

@jyhnnhyj
Copy link
Contributor Author

jyhnnhyj commented Feb 26, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented Feb 26, 2019 via email

@jyhnnhyj
Copy link
Contributor Author

jyhnnhyj commented Feb 27, 2019 via email

@jyhnnhyj
Copy link
Contributor Author

a couple of questions:
local/run_cleanup_segmentation.sh fails with this error:
(I'm just running on a fresh setup, using this latest commit of Kaldi on 26th of Feb: bf33f1fb13ee8ddfe4cd0df3d73656e1b491ef01)
Here is the error message:

# utils/sym2int.pl --map-oov 38 -f 2- data/lang/words.txt exp/tri3_cleaned_work/graphs/texts/text.1 | steps/cleanup/make_biased_lms.py --min-words-per-graph=100 "--lm-opts=--word-disambig-symbol=152214 --ngram-order=4 --min-lm-state-count=10 --discounting-const
ant=0.3 --top-words=exp/tri3_cleaned_work/graphs/top_words.int" exp/tri3_cleaned_work/graphs/fsts/utt2group.1 | compile-train-graphs-fsts --transition-scale=1.0 --self-loop-scale=0.1 --read-disambig-syms=data/lang/phones/disambig.int /mnt/data1/Projects/kaldi/e
gs/tedlium/s5_r3/exp/tri3_cleaned_work/tree /mnt/data1/Projects/kaldi/egs/tedlium/s5_r3/exp/tri3_cleaned_work/final.mdl data/lang/L_disambig.fst ark:- ark,scp:exp/tri3_cleaned_work/graphs/fsts/HCLG.fsts.1.ark,exp/tri3_cleaned_work/graphs/fsts/HCLG.fsts.1.scp 
# Started at Thu Feb 28 10:07:38 UTC 2019
#
compile-train-graphs-fsts --transition-scale=1.0 --self-loop-scale=0.1 --read-disambig-syms=data/lang/phones/disambig.int /mnt/data1/Projects/kaldi/egs/tedlium/s5_r3/exp/tri3_cleaned_work/tree /mnt/data1/Projects/kaldi/egs/tedlium/s5_r3/exp/tri3_cleaned_work/fi
nal.mdl data/lang/L_disambig.fst ark:- ark,scp:exp/tri3_cleaned_work/graphs/fsts/HCLG.fsts.1.ark,exp/tri3_cleaned_work/graphs/fsts/HCLG.fsts.1.scp
sym2int.pl: replacing 11th with 38
sym2int.pl: replacing aak with 38
sym2int.pl: replacing 70s with 38
make_biased_lms.py: error calling subprocess, command was: steps/cleanup/internal/make_one_biased_lm.py --word-disambig-symbol=152214 --ngram-order=4 --min-lm-state-count=10 --discounting-constant=0.3 --top-words=exp/tri3_cleaned_work/graphs/top_words.int, erro
r was : a bytes-like object is required, not 'str'
sym2int.pl: replacing say' with 38
sym2int.pl: replacing crazy' with 38
sym2int.pl: replacing 12th with 38
sym2int.pl: replacing geni with 38
sym2int.pl: replacing removed's with 38
sym2int.pl: replacing removed's with 38
sym2int.pl: replacing = with 38
sym2int.pl: replacing rsvp'd with 38
sym2int.pl: replacing calam with 38
sym2int.pl: replacing calam with 38
make_one_biased_lm.py: processed 0 lines of input
Traceback (most recent call last):
  File "steps/cleanup/internal/make_one_biased_lm.py", line 310, in <module>
    ngram_counts.PrintAsFst(args.word_disambig_symbol)
  File "steps/cleanup/internal/make_one_biased_lm.py", line 276, in PrintAsFst
    this_cost = -math.log(self.GetProb(hist, word, total_count_map))
  File "steps/cleanup/internal/make_one_biased_lm.py", line 246, in GetProb
    prob = float(word_to_count[word]) / total_count
ZeroDivisionError: float division by zero
ASSERTION_FAILED (compile-train-graphs-fsts[5.5.210~1-bf33f]:CompileGraphs():training-graph-compiler.cc:186) : 'phone2word_fst.Start() != kNoStateId && "Perhaps you have words missing in your lexicon?"'

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
kaldi::TrainingGraphCompiler::CompileGraphs(std::vector<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > > const*, std::alloc
ator<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > > const*> > const&, std::vector<fst::VectorFst<fst::ArcTpl<fst::Tropica
lWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > >*, std::allocator<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::Tr
opicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > >*> >*)
main
__libc_start_main
_start

bash: line 1: 28783 Broken pipe             utils/sym2int.pl --map-oov 38 -f 2- data/lang/words.txt exp/tri3_cleaned_work/graphs/texts/text.1
     28787 Exit 1                  | steps/cleanup/make_biased_lms.py --min-words-per-graph=100 "--lm-opts=--word-disambig-symbol=152214 --ngram-order=4 --min-lm-state-count=10 --discounting-constant=0.3 --top-words=exp/tri3_cleaned_work/graphs/top_words.int" e
xp/tri3_cleaned_work/graphs/fsts/utt2group.1
     28791 Aborted                 (core dumped) | compile-train-graphs-fsts --transition-scale=1.0 --self-loop-scale=0.1 --read-disambig-syms=data/lang/phones/disambig.int /mnt/data1/Projects/kaldi/egs/tedlium/s5_r3/exp/tri3_cleaned_work/tree /mnt/data1/Projec
ts/kaldi/egs/tedlium/s5_r3/exp/tri3_cleaned_work/final.mdl data/lang/L_disambig.fst ark:- ark,scp:exp/tri3_cleaned_work/graphs/fsts/HCLG.fsts.1.ark,exp/tri3_cleaned_work/graphs/fsts/HCLG.fsts.1.scp
# Accounting: time=2 threads=1
# Ended (code 134) at Thu Feb 28 10:07:40 UTC 2019, elapsed time 2 seconds

If I skip that step and change local/chain/run_tdnn.sh to not use the cleaned data, that one also fails with this error:

$ local/chain/run_tdnn.sh
local/chain/run_tdnn.sh
local/nnet3/run_ivector_common.sh: invalid option --min-seg-len

which seems a mismatch in params, I can try to fix these, but just wanted to double check if I should be using a different set of scripts...

@danpovey
Copy link
Contributor

There are two problems here.
One, you probably have python3 installed as your default python. This breaks the cleanup scripts; Vimal has a PR here
#3054
to fix it, I will merge soon after it's tested. Please merge with Vimal's PR, which will be useful to verify it works.

Secondly, that run_tdnn.sh script, if just copied from s5_r2, may not be fully compatible with the setup. You will have to remove --min-seg-len option; and do a diff with the existing 'run_tdnn.sh' script and try to figure out which differences have to do with things like a change in the directory setup of tdnn s5_r3 vs. s5_r2, or other local changes, and apply those as needed.

@jyhnnhyj
Copy link
Contributor Author

jyhnnhyj commented Mar 1, 2019

re Vimal's fix, I merged with it and can confirm it solves the problem.
(re Python3, that's right, but during Kaldi setup, it creates a link for Python2.7 and was expecting to pick that one) - but anyway ,this is now solved. I'll continue with the rest and update the progress/issues here.

@danpovey
Copy link
Contributor

danpovey commented Mar 9, 2019 via email

@jyhnnhyj
Copy link
Contributor Author

sorry for not updating earlier, was distracted by a couple of other deadlines, plan to resume this on Monday - already run the run_cleanup_segmentation.sh last week, worked as expected and would start training on Monday

@jyhnnhyj
Copy link
Contributor Author

a quick update that I just started running the training script

@jyhnnhyj
Copy link
Contributor Author

so the ivector part ran successfully - but after that step, when it was validating the files, it fails
local/chain/run_tdnn.sh: expected file data/train_cleaned_sp_hires_comb/feats.scp to exist
I tried to understand what this _comb thing is about - but couldn't trace where it should have been created - any suggestions?

@danpovey
Copy link
Contributor

danpovey commented Mar 13, 2019 via email

@jyhnnhyj
Copy link
Contributor Author

so after those comb changes, it progressed and now failed complaining about config files
here is the error message:

Traceback (most recent call last):
  File "steps/nnet3/chain/train.py", line 625, in main
    train(args, run_opts)
  File "steps/nnet3/chain/train.py", line 302, in train
    variables = common_train_lib.parse_generic_config_vars_file(var_file)
  File "steps/libs/nnet3/train/common.py", line 352, in parse_generic_config_vars_file
    "i.e. xconfig_to_configs.py.".format(field_value))
Exception: You have num_hidden_layers=7 (real meaning: your config files are intended to do discriminative pretraining).  Since Kaldi 5.2, this is no longer supported --> use newer config-creation scripts, i.e. xconfig_to_configs.py.

I checked librispeech and noticed how the new xconfig file is dumped
also another one in tedlium local/chain/run_tdnnf.sh
which network config I should use?

@danpovey
Copy link
Contributor

danpovey commented Mar 14, 2019 via email

@jyhnnhyj
Copy link
Contributor Author

you're right - I'm sorry, somehow when trying to fix some of the issues I re-used that old script

@jyhnnhyj
Copy link
Contributor Author

a quick update about the training progress, it's in iteration 90/227

@jyhnnhyj
Copy link
Contributor Author

all done, here are the results:

dev:          %WER 8.03 [ 1428 / 17783, 255 ins, 274 del, 899 sub ]
dev_rescore:  %WER 7.44 [ 1323 / 17783, 242 ins, 267 del, 814 sub ]
test:         %WER 10.11 [ 2780 / 27500, 252 ins, 1083 del, 1445 sub ]
test_rescore: %WER 7.85 [ 2158 / 27500, 323 ins, 560 del, 1275 sub ]

not sure how are these comparable to the previous results?
In the header of the run_tdnn.sh, I can see there:

# System                tdnn1f_sp_bi tdnn1g_sp
# WER on dev(orig)            8.9       7.9
# WER on dev(rescored)        8.1       7.3
# WER on test(orig)           9.1       8.0
# WER on test(rescored)       8.6       7.6

so the new results seems be slightly worse than tdnn1g_sp, but better than tdnn1f_sp_bi ?

@danpovey
Copy link
Contributor

danpovey commented Mar 18, 2019 via email

@jyhnnhyj
Copy link
Contributor Author

sure

s5_r3> steps/info/chain_dir_info.pl exp/chain_cleaned/tdnn1g_sp_bi/
exp/chain_cleaned/tdnn1g_sp_bi/: num-iters=228 nj=3..12 num-params=9.5M dim=40+100->3664 combine=-0.068->-0.068 (over 4) xent:train/valid[151,227,final]=(-1.15,-0.967,-0.960/-1.25,-1.09,-1.08) logprob:train/valid[151,227,final]=(-0.090,-0.068,-0.067/-0.102,-0.085,-0.084)

@jyhnnhyj
Copy link
Contributor Author

made the changes, but somehow messed up my kaldi fork, had to delete it and now can't push to this anymore
created an other pr #3149 which includes all these changes
should we merge that one or I can push my changes to this orphaned pr? (sorry my git sucks)

@danpovey
Copy link
Contributor

OK, we will discuss on #3149.
git rebase might help. github is showing some diffs it shouldn't be showing, but it may be a github issue, I've seen it before.

@danpovey danpovey closed this Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants