Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
f9c3443
Initial commit for Kaldi Discophone branch
pzelasko Jun 18, 2020
9cd6625
Missing training parameters
pzelasko Jun 18, 2020
43a4632
Flip stage sign
pzelasko Jun 18, 2020
dd35b63
Missing lang_name
pzelasko Jun 18, 2020
89ef0c1
Fix GP data paths
pzelasko Jun 18, 2020
e49ef00
some fixes
pzelasko Jun 18, 2020
f4cdc14
Fix processing lexicon lines without transcripts
pzelasko Jun 18, 2020
d02a563
Fix empty line in extra_questions
pzelasko Jun 18, 2020
d271409
move <unk> to nonsilence phones
pzelasko Jun 18, 2020
2dead61
Remove lines without transcription from lexicon
pzelasko Jun 18, 2020
e9c42d4
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jun 18, 2020
f130c1a
Add unk to lexicon when missing
pzelasko Jun 18, 2020
b19546e
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jun 18, 2020
6221317
Remove some forbidden symbols
pzelasko Jun 18, 2020
d9b57bc
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jun 18, 2020
e1d1bf9
Use normal text in Kaldi recipe
pzelasko Jun 18, 2020
0660c74
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jun 18, 2020
6c54511
Fix paths
pzelasko Jun 18, 2020
27e4fde
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jun 18, 2020
3c0a2ce
Add sleep
pzelasko Jun 18, 2020
b102798
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jun 18, 2020
d666dc7
Fix in make_mfcc validation for non-ascii chars
pzelasko Jun 18, 2020
c556075
Change some corpora paths after CLSP data loss
pzelasko Jul 7, 2020
f77f08d
Merge remote-tracking branch 'fork/discophone' into discophone
pzelasko Jul 7, 2020
f02edcd
Add shorten installation and path
pzelasko Jul 7, 2020
67f480a
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jul 7, 2020
24e2dfb
Fix directories used for feature extraction and output them to console
pzelasko Jul 7, 2020
46e5b2b
Change default language composition
pzelasko Jul 7, 2020
0104871
Add missing directory
pzelasko Jul 7, 2020
8f472e5
Safe-guard for corpora with low number of utterances
pzelasko Jul 7, 2020
3ea5d37
Fix symlinks in subset dirs
pzelasko Jul 7, 2020
a77a5de
Fix alignment arguments and stage comparison
pzelasko Jul 7, 2020
087ef99
Add data prep files from ESPnet discophone recipe
pzelasko Jul 8, 2020
45a4683
Add exec permissions
pzelasko Jul 8, 2020
a26b661
Add missing files for data prep + fix paths for Babel
pzelasko Jul 8, 2020
b43f909
Add missing lang.conf
pzelasko Jul 8, 2020
0f12a93
Fixing mboshi
pzelasko Jul 8, 2020
66feae4
fix
pzelasko Jul 8, 2020
b6932a9
Fix perl file getting accidentally reformatted
pzelasko Jul 9, 2020
8693191
Merge remote-tracking branch 'fork/discophone' into discophone
pzelasko Jul 9, 2020
8811f28
Fix a number of issues with BABEL
pzelasko Jul 9, 2020
fca40a8
Fix data getting mixed up
pzelasko Jul 9, 2020
9d0de1e
Remove the stdout/stderr files
pzelasko Jul 9, 2020
8559b6b
Merge remote-tracking branch 'fork/discophone' into discophone
pzelasko Jul 9, 2020
beaf75d
Start creating a nnet3 discophone recipe
pzelasko Jul 10, 2020
ff0e983
Add missing reestimate langp
pzelasko Jul 29, 2020
1888c02
Merge branch 'discophone' of https://github.com/pzelasko/kaldi into d…
pzelasko Jul 29, 2020
4359a0b
Create run_ivector_common.sh
syfengcuhk Aug 7, 2020
a7de571
Create run_tdnn_1b.sh
syfengcuhk Aug 7, 2020
e12ac65
Add files via upload
syfengcuhk Aug 7, 2020
ad533e2
Add files via upload
syfengcuhk Aug 7, 2020
bc617bd
Add files via upload
syfengcuhk Aug 7, 2020
2b0cfef
Update to Piotr's version
syfengcuhk Aug 31, 2020
a67b4e9
Update path.sh
syfengcuhk Aug 31, 2020
393fa08
Update cmd.sh
syfengcuhk Aug 31, 2020
602f9bf
Delete run_gp.sh
syfengcuhk Aug 31, 2020
9a2b2a8
Delete run_tdnnf.sh
syfengcuhk Aug 31, 2020
f2fa994
uncomment run_ivector_common.sh
syfengcuhk Aug 31, 2020
261d95b
uncomment run_ivector_common.sh
syfengcuhk Aug 31, 2020
036002d
comment common_egs_dir
syfengcuhk Aug 31, 2020
91794c5
Create run_tdnn_1g.sh
syfengcuhk Aug 31, 2020
3b9ee60
Update run_tdnn_1g.sh
syfengcuhk Aug 31, 2020
31fbd16
Create run_ivector_common.sh
syfengcuhk Aug 7, 2020
9d1715b
Create run_tdnn_1b.sh
syfengcuhk Aug 7, 2020
e7f6740
Add files via upload
syfengcuhk Aug 7, 2020
5047291
Add files via upload
syfengcuhk Aug 7, 2020
cc33aa8
Add files via upload
syfengcuhk Aug 7, 2020
a3716b2
Update to Piotr's version
syfengcuhk Aug 31, 2020
d83bc4a
Update path.sh
syfengcuhk Aug 31, 2020
427c580
Update cmd.sh
syfengcuhk Aug 31, 2020
ad966b5
Delete run_gp.sh
syfengcuhk Aug 31, 2020
f8365be
Delete run_tdnnf.sh
syfengcuhk Aug 31, 2020
b1158cb
uncomment run_ivector_common.sh
syfengcuhk Aug 31, 2020
0e906da
uncomment run_ivector_common.sh
syfengcuhk Aug 31, 2020
a0967c5
comment common_egs_dir
syfengcuhk Aug 31, 2020
1d2bc4b
Create run_tdnn_1g.sh
syfengcuhk Aug 31, 2020
d9446ce
Update run_tdnn_1g.sh
syfengcuhk Aug 31, 2020
24f9daa
Siyuan's mono-lingual TDNN recipe - fixed with git rebase
pzelasko Sep 8, 2020
17b50ec
Merge branch 'master' into discophone
pzelasko Sep 10, 2020
023c87a
Full support for phone tokens/phones switching
pzelasko Sep 21, 2020
3c4f350
Support for using either phone tokens or phones
pzelasko Sep 21, 2020
7e44457
Create run.sh
syfengcuhk Sep 22, 2020
e2ea9d1
Create Readme.txt
syfengcuhk Sep 22, 2020
b8509b0
Update Readme.txt
syfengcuhk Sep 22, 2020
feeef6c
Update run.sh
syfengcuhk Sep 22, 2020
b71b17e
Update run.sh
syfengcuhk Sep 22, 2020
44afbe7
Create cmd.sh
syfengcuhk Sep 22, 2020
60a183e
Create path.sh
syfengcuhk Sep 22, 2020
95989a2
Create setup_languages.sh
syfengcuhk Sep 22, 2020
2324edc
Add files via upload
syfengcuhk Sep 22, 2020
5bf00f6
Add run.sh and local/
syfengcuhk Sep 22, 2020
d1ae1c5
Create Readme.txt
syfengcuhk Sep 22, 2020
7b98f57
move path.sh and cmd.sh to ./
syfengcuhk Sep 22, 2020
0e2faf7
Add files
syfengcuhk Sep 22, 2020
1ae378b
Update .DS_Store
syfengcuhk Sep 22, 2020
a3fd97a
removed redundant $data_aug_suffix in $train_data_dir
syfengcuhk Sep 23, 2020
d62287a
Create run_ivector_common.sh
syfengcuhk Sep 23, 2020
4ec1e9a
Added run_tdnn_1g.sh
syfengcuhk Sep 23, 2020
b327c86
Update .DS_Store
syfengcuhk Sep 25, 2020
9502e11
deleted ./conf/.svn/
syfengcuhk Sep 25, 2020
00800ec
Update cmd.sh
syfengcuhk Sep 25, 2020
34a6c19
Update run.sh
syfengcuhk Sep 25, 2020
b3894bc
Update run.sh
syfengcuhk Sep 25, 2020
3fd8142
Update v1_multilang/Readme.txt
syfengcuhk Sep 25, 2020
5b6e8cc
Merge branch 'discophone' of https://github.com/syfengcuhk/kaldi into…
syfengcuhk Sep 25, 2020
edfb0b4
removed ./conf
syfengcuhk Sep 25, 2020
efc6f14
removed v1
syfengcuhk Sep 25, 2020
523bbe9
Merge pull request #4 from syfengcuhk/discophone
syfengcuhk Sep 25, 2020
e05e2a5
Revert "Discophone"
pzelasko Sep 25, 2020
feb6d3e
Merge pull request #6 from pzelasko/revert-4-discophone
pzelasko Sep 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions egs/babel/s5d/conf/corpora_paths.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
BABEL_ROOT="/export/corpora5/Babel"
CANTONESE_ROOT="${BABEL_ROOT}/IARPA_BABEL_BP_101"
BENGALI_ROOT="${BABEL_ROOT}/BABEL_OP1_103"
VIETNAMESE_ROOT="${BABEL_ROOT}/BABEL_BP_107"
LAO_ROOT="${BABEL_ROOT}/IARPA_Babel_203"
ZULU_ROOT="${BABEL_ROOT}/IARPA_BABEL_OP1_206"
AMHARIC_ROOT="${BABEL_ROOT}/IARPA-babel307b-v1.0b-build"
JAVANESE_ROOT="${BABEL_ROOT}/IARPA-babel402b-v1.0b-build"
GEORGIAN_ROOT="${BABEL_ROOT}/IARPA-babel404b-v1.0a-build"
15 changes: 8 additions & 7 deletions egs/babel/s5d/conf/lang/101-cantonese.FLP.official.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@

# include common settings for fullLP systems.
. conf/common.fullLP || exit 1;
. conf/corpora_paths.sh


#speech corpora files location
train_data_dir=/export/babel/data/101-cantonese/release-current/conversational/training
train_data_dir=${CANTONESE_ROOT}/conversational/training
train_data_list=./conf/lists/101-cantonese/train.FullLP.list
train_nj=32


#Radical reduced DEV corpora files location
dev2h_data_dir=/export/babel/data/101-cantonese/release-current/conversational/dev
dev2h_data_dir=${CANTONESE_ROOT}/conversational/dev
dev2h_data_list=./conf/lists/101-cantonese/dev.list
dev2h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-dev/IARPA-babel101b-v0.4c_conv-dev.mitllfa3.rttm
dev2h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-dev/IARPA-babel101b-v0.4c_conv-dev.scoring.ecf.xml
Expand All @@ -27,7 +28,7 @@ dev2h_subset_ecf=true


#Official DEV corpora files location
dev10h_data_dir=/export/babel/data/101-cantonese/release-current/conversational/dev
dev10h_data_dir=${CANTONESE_ROOT}/conversational/dev
dev10h_data_list=./conf/lists/101-cantonese/dev.list
dev10h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-dev/IARPA-babel101b-v0.4c_conv-dev.mitllfa3.rttm
dev10h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-dev/IARPA-babel101b-v0.4c_conv-dev.scoring.ecf.xml
Expand All @@ -40,7 +41,7 @@ dev10h_nj=32


#Official EVAL period evaluation data files
eval_data_dir=/export/babel/data/101-cantonese/release-babel101b-v0.4c-eval/BABEL_BP_101/conversational/eval
eval_data_dir=${CANTONESE_ROOT}/conversational/eval
eval_data_list=./conf/lists/101-cantonese/eval.list
eval_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-eval.ecf.xml
eval_kwlists=(
Expand All @@ -51,7 +52,7 @@ eval_nj=32


#Official post-EVAL period data files
evalpart1_data_dir=/export/babel/data/101-cantonese/release-babel101b-v0.4c-eval/BABEL_BP_101/conversational/eval
evalpart1_data_dir=${CANTONESE_ROOT}/conversational/eval
evalpart1_data_list=./conf/lists/101-cantonese/evalpart1.list
evalpart1_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-evalpart1/IARPA-babel101b-v0.4c_conv-evalpart1.mitllfa3.rttm
evalpart1_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel101b-v0.4c_conv-evalpart1/IARPA-babel101b-v0.4c_conv-evalpart1.scoring.ecf.xml
Expand All @@ -65,7 +66,7 @@ evalpart1_nj=32

#Shadow data files
shadow_data_dir=(
/export/babel/data/101-cantonese/release-current/conversational/dev
${CANTONESE_ROOT}/conversational/dev
/export/babel/data/101-cantonese/release-babel101b-v0.4c-eval/BABEL_BP_101/conversational/eval
) # shadow_data_dir
shadow_data_list=(
Expand All @@ -83,7 +84,7 @@ shadow_nj=32


#Unsupervised training set file (./conf/lists/101-cantonese/untranscribed-training.list) not found.
lexicon_file=/export/babel/data/101-cantonese/release-current/conversational/reference_materials/lexicon.txt
lexicon_file=${CANTONESE_ROOT}/conversational/reference_materials/lexicon.txt
lexiconFlags="--romanized --oov <unk>"
cer=1

Expand Down
17 changes: 9 additions & 8 deletions egs/babel/s5d/conf/lang/103-bengali.FLP.official.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@

# include common settings for fullLP systems.
. conf/common.fullLP || exit 1;
. conf/corpora_paths.sh


#speech corpora files location
train_data_dir=/export/babel/data/103-bengali/release-current/conversational/training
train_data_dir=${BENGALI_ROOT}/conversational/training
train_data_list=./conf/lists/103-bengali/train.FullLP.list
train_nj=32


#Radical reduced DEV corpora files location
dev2h_data_dir=/export/babel/data/103-bengali/release-current/conversational/dev
dev2h_data_dir=${BENGALI_ROOT}/conversational/dev
dev2h_data_list=./conf/lists/103-bengali/dev.list
dev2h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-dev/IARPA-babel103b-v0.4b_conv-dev.mitllfa3.rttm
dev2h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-dev/IARPA-babel103b-v0.4b_conv-dev.scoring.ecf.xml
Expand All @@ -29,7 +30,7 @@ dev2h_subset_ecf=true


#Official DEV corpora files location
dev10h_data_dir=/export/babel/data/103-bengali/release-current/conversational/dev
dev10h_data_dir=${BENGALI_ROOT}/conversational/dev
dev10h_data_list=./conf/lists/103-bengali/dev.list
dev10h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-dev/IARPA-babel103b-v0.4b_conv-dev.mitllfa3.rttm
dev10h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-dev/IARPA-babel103b-v0.4b_conv-dev.scoring.ecf.xml
Expand All @@ -44,7 +45,7 @@ dev10h_nj=32


#Official EVAL period evaluation data files
eval_data_dir=/export/babel/data/103-bengali/release-current/conversational/eval
eval_data_dir=${BENGALI_ROOT}/conversational/eval
eval_data_list=./conf/lists/103-bengali/eval.list
eval_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-eval.ecf.xml
eval_kwlists=(
Expand All @@ -57,7 +58,7 @@ eval_nj=32


#Official post-EVAL period data files
evalpart1_data_dir=/export/babel/data/103-bengali/release-current/conversational/eval
evalpart1_data_dir=${BENGALI_ROOT}/conversational/eval
evalpart1_data_list=./conf/lists/103-bengali/evalpart1.list
evalpart1_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-evalpart1/IARPA-babel103b-v0.4b_conv-evalpart1.mitllfa3.rttm
evalpart1_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel103b-v0.4b_conv-evalpart1/IARPA-babel103b-v0.4b_conv-evalpart1.scoring.ecf.xml
Expand All @@ -73,8 +74,8 @@ evalpart1_nj=32

#Shadow data files
shadow_data_dir=(
/export/babel/data/103-bengali/release-current/conversational/dev
/export/babel/data/103-bengali/release-current/conversational/eval
${BENGALI_ROOT}/conversational/dev
${BENGALI_ROOT}/conversational/eval
) # shadow_data_dir
shadow_data_list=(
./conf/lists/103-bengali/dev.list
Expand All @@ -93,7 +94,7 @@ shadow_nj=32


#Unsupervised training set file (./conf/lists/103-bengali/untranscribed-training.list) not found.
lexicon_file=/export/babel/data/103-bengali/release-current/conversational/reference_materials/lexicon.txt
lexicon_file=${BENGALI_ROOT}/conversational/reference_materials/lexicon.txt
lexiconFlags="--romanized --oov <unk>"


Expand Down
17 changes: 9 additions & 8 deletions egs/babel/s5d/conf/lang/107-vietnamese.FLP.official.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@

# include common settings for fullLP systems.
. conf/common.fullLP || exit 1;
. conf/corpora_paths.sh


#speech corpora files location
train_data_dir=/export/babel/data/107-vietnamese/release-current/conversational/training
train_data_dir=${VIETNAMESE_ROOT}/conversational/training
train_data_list=./conf/lists/107-vietnamese/train.FullLP.list
train_nj=32


#Radical reduced DEV corpora files location
dev2h_data_dir=/export/babel/data/107-vietnamese/release-current/conversational/dev
dev2h_data_dir=${VIETNAMESE_ROOT}/conversational/dev
dev2h_data_list=./conf/lists/107-vietnamese/dev.list
dev2h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-dev/IARPA-babel107b-v0.7_conv-dev.mitllfa3.rttm
dev2h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-dev/IARPA-babel107b-v0.7_conv-dev.scoring.ecf.xml
Expand All @@ -27,7 +28,7 @@ dev2h_subset_ecf=true


#Official DEV corpora files location
dev10h_data_dir=/export/babel/data/107-vietnamese/release-current/conversational/dev
dev10h_data_dir=${VIETNAMESE_ROOT}/conversational/dev
dev10h_data_list=./conf/lists/107-vietnamese/dev.list
dev10h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-dev/IARPA-babel107b-v0.7_conv-dev.mitllfa3.rttm
dev10h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-dev/IARPA-babel107b-v0.7_conv-dev.scoring.ecf.xml
Expand All @@ -40,7 +41,7 @@ dev10h_nj=32


#Official EVAL period evaluation data files
eval_data_dir=/export/babel/data/107-vietnamese/release-current/conversational/eval
eval_data_dir=${VIETNAMESE_ROOT}/conversational/eval
eval_data_list=./conf/lists/107-vietnamese/eval.list
eval_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-eval.ecf.xml
eval_kwlists=(
Expand All @@ -51,7 +52,7 @@ eval_nj=32


#Official post-EVAL period data files
evalpart1_data_dir=/export/babel/data/107-vietnamese/release-current/conversational/eval
evalpart1_data_dir=${VIETNAMESE_ROOT}/conversational/eval
evalpart1_data_list=./conf/lists/107-vietnamese/evalpart1.list
evalpart1_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-evalpart1/IARPA-babel107b-v0.7_conv-evalpart1.mitllfa3.rttm
evalpart1_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel107b-v0.7_conv-evalpart1/IARPA-babel107b-v0.7_conv-evalpart1.scoring.ecf.xml
Expand All @@ -65,8 +66,8 @@ evalpart1_nj=32

#Shadow data files
shadow_data_dir=(
/export/babel/data/107-vietnamese/release-current/conversational/dev
/export/babel/data/107-vietnamese/release-current/conversational/eval
${VIETNAMESE_ROOT}/conversational/dev
${VIETNAMESE_ROOT}/conversational/eval
) # shadow_data_dir
shadow_data_list=(
./conf/lists/107-vietnamese/dev.list
Expand All @@ -83,7 +84,7 @@ shadow_nj=32


#Unsupervised training set file (./conf/lists/107-vietnamese/untranscribed-training.list) not found.
lexicon_file=/export/babel/data/107-vietnamese/release-current/conversational/reference_materials/lexicon.txt
lexicon_file=${VIETNAMESE_ROOT}/conversational/reference_materials/lexicon.txt



17 changes: 9 additions & 8 deletions egs/babel/s5d/conf/lang/203-lao.FLP.official.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@

# include common settings for fullLP systems.
. conf/common.fullLP || exit 1;
. conf/corpora_paths.sh


#speech corpora files location
train_data_dir=/export/babel/data/203-lao/release-current/conversational/training
train_data_dir=${LAO_ROOT}/conversational/training
train_data_list=./conf/lists/203-lao/train.FullLP.list
train_nj=32


#Radical reduced DEV corpora files location
dev2h_data_dir=/export/babel/data/203-lao/release-current/conversational/dev
dev2h_data_dir=${LAO_ROOT}/conversational/dev
dev2h_data_list=./conf/lists/203-lao/dev.list
dev2h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-dev/IARPA-babel203b-v3.1a_conv-dev.mitllfa3.rttm
dev2h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-dev/IARPA-babel203b-v3.1a_conv-dev.scoring.ecf.xml
Expand All @@ -29,7 +30,7 @@ dev2h_subset_ecf=true


#Official DEV corpora files location
dev10h_data_dir=/export/babel/data/203-lao/release-current/conversational/dev
dev10h_data_dir=${LAO_ROOT}/conversational/dev
dev10h_data_list=./conf/lists/203-lao/dev.list
dev10h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-dev/IARPA-babel203b-v3.1a_conv-dev.mitllfa3.rttm
dev10h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-dev/IARPA-babel203b-v3.1a_conv-dev.scoring.ecf.xml
Expand All @@ -44,7 +45,7 @@ dev10h_nj=32


#Official EVAL period evaluation data files
eval_data_dir=/export/babel/data/203-lao/release-current/conversational/eval
eval_data_dir=${LAO_ROOT}/conversational/eval
eval_data_list=./conf/lists/203-lao/eval.list
eval_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-eval.ecf.xml
eval_kwlists=(
Expand All @@ -57,7 +58,7 @@ eval_nj=32


#Official post-EVAL period data files
evalpart1_data_dir=/export/babel/data/203-lao/release-current/conversational/eval
evalpart1_data_dir=${LAO_ROOT}/conversational/eval
evalpart1_data_list=./conf/lists/203-lao/evalpart1.list
evalpart1_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-evalpart1/IARPA-babel203b-v3.1a_conv-evalpart1.mitllfa3.rttm
evalpart1_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel203b-v3.1a_conv-evalpart1/IARPA-babel203b-v3.1a_conv-evalpart1.scoring.ecf.xml
Expand All @@ -73,8 +74,8 @@ evalpart1_nj=32

#Shadow data files
shadow_data_dir=(
/export/babel/data/203-lao/release-current/conversational/dev
/export/babel/data/203-lao/release-current/conversational/eval
${LAO_ROOT}/conversational/dev
${LAO_ROOT}/conversational/eval
) # shadow_data_dir
shadow_data_list=(
./conf/lists/203-lao/dev.list
Expand All @@ -93,7 +94,7 @@ shadow_nj=32


#Unsupervised training set file (./conf/lists/203-lao/untranscribed-training.list) not found.
lexicon_file=/export/babel/data/203-lao/release-current/conversational/reference_materials/lexicon.txt
lexicon_file=${LAO_ROOT}/conversational/reference_materials/lexicon.txt
lexiconFlags="--romanized --oov <unk>"


Expand Down
17 changes: 9 additions & 8 deletions egs/babel/s5d/conf/lang/206-zulu.FLP.official.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@

# include common settings for fullLP systems.
. conf/common.fullLP || exit 1;
. conf/corpora_paths.sh


#speech corpora files location
train_data_dir=/export/babel/data/206-zulu/release-current/conversational/training
train_data_dir=${ZULU_ROOT}/conversational/training
train_data_list=./conf/lists/206-zulu/train.FullLP.list
train_nj=32


#Radical reduced DEV corpora files location
dev2h_data_dir=/export/babel/data/206-zulu/release-current/conversational/dev
dev2h_data_dir=${ZULU_ROOT}/conversational/dev
dev2h_data_list=./conf/lists/206-zulu/dev.list
dev2h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-dev/IARPA-babel206b-v0.1e_conv-dev.mitllfa3.rttm
dev2h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-dev/IARPA-babel206b-v0.1e_conv-dev.scoring.ecf.xml
Expand All @@ -29,7 +30,7 @@ dev2h_subset_ecf=true


#Official DEV corpora files location
dev10h_data_dir=/export/babel/data/206-zulu/release-current/conversational/dev
dev10h_data_dir=${ZULU_ROOT}/conversational/dev
dev10h_data_list=./conf/lists/206-zulu/dev.list
dev10h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-dev/IARPA-babel206b-v0.1e_conv-dev.mitllfa3.rttm
dev10h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-dev/IARPA-babel206b-v0.1e_conv-dev.scoring.ecf.xml
Expand All @@ -44,7 +45,7 @@ dev10h_nj=32


#Official EVAL period evaluation data files
eval_data_dir=/export/babel/data/206-zulu/release-current/conversational/eval
eval_data_dir=${ZULU_ROOT}/conversational/eval
eval_data_list=./conf/lists/206-zulu/eval.list
eval_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-eval.ecf.xml
eval_kwlists=(
Expand All @@ -57,7 +58,7 @@ eval_nj=32


#Official post-EVAL period data files
evalpart1_data_dir=/export/babel/data/206-zulu/release-current/conversational/eval
evalpart1_data_dir=${ZULU_ROOT}/conversational/eval
evalpart1_data_list=./conf/lists/206-zulu/evalpart1.list
evalpart1_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-evalpart1/IARPA-babel206b-v0.1e_conv-evalpart1.mitllfa3.rttm
evalpart1_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel206b-v0.1e_conv-evalpart1/IARPA-babel206b-v0.1e_conv-evalpart1.scoring.ecf.xml
Expand All @@ -73,8 +74,8 @@ evalpart1_nj=32

#Shadow data files
shadow_data_dir=(
/export/babel/data/206-zulu/release-current/conversational/dev
/export/babel/data/206-zulu/release-current/conversational/eval
${ZULU_ROOT}/conversational/dev
${ZULU_ROOT}/conversational/eval
) # shadow_data_dir
shadow_data_list=(
./conf/lists/206-zulu/dev.list
Expand All @@ -93,7 +94,7 @@ shadow_nj=32


#Unsupervised training set file (./conf/lists/206-zulu/untranscribed-training.list) not found.
lexicon_file=/export/babel/data/206-zulu/release-current/conversational/reference_materials/lexicon.txt
lexicon_file=${ZULU_ROOT}/conversational/reference_materials/lexicon.txt



13 changes: 7 additions & 6 deletions egs/babel/s5d/conf/lang/307-amharic.FLP.official.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@

# include common settings for fullLP systems.
. conf/common.fullLP || exit 1;
. conf/corpora_paths.sh


#speech corpora files location
train_data_dir=/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-build/BABEL_OP3_307/conversational/training
train_data_dir=${AMHARIC_ROOT}/conversational/training
train_data_list=./conf/lists/307-amharic/training.list
train_nj=32


#Radical reduced DEV corpora files location
dev2h_data_dir=/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-build/BABEL_OP3_307/conversational/dev
dev2h_data_dir=${AMHARIC_ROOT}/conversational/dev
dev2h_data_list=./conf/lists/307-amharic/dev.2h.list
dev2h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel307b-v1.0b_conv-dev/IARPA-babel307b-v1.0b_conv-dev.mitllfa3.rttm
dev2h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel307b-v1.0b_conv-dev/IARPA-babel307b-v1.0b_conv-dev.scoring.ecf.xml
Expand All @@ -27,7 +28,7 @@ dev2h_subset_ecf=true


#Official DEV corpora files location
dev10h_data_dir=/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-build/BABEL_OP3_307/conversational/dev
dev10h_data_dir=${AMHARIC_ROOT}/conversational/dev
dev10h_data_list=./conf/lists/307-amharic/dev.list
dev10h_rttm_file=/export/babel/data/scoring/IndusDB/IARPA-babel307b-v1.0b_conv-dev/IARPA-babel307b-v1.0b_conv-dev.mitllfa3.rttm
dev10h_ecf_file=/export/babel/data/scoring/IndusDB/IARPA-babel307b-v1.0b_conv-dev/IARPA-babel307b-v1.0b_conv-dev.scoring.ecf.xml
Expand Down Expand Up @@ -67,7 +68,7 @@ evalpart1_nj=32

#Shadow data files
shadow_data_dir=(
/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-build/BABEL_OP3_307/conversational/dev
${AMHARIC_ROOT}/conversational/dev
/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-eval/BABEL_OP3_307/conversational/eval
) # shadow_data_dir
shadow_data_list=(
Expand All @@ -85,12 +86,12 @@ shadow_nj=32


#Unsupervised dataset for FullLP condition
unsup_data_dir=/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-build/BABEL_OP3_307/conversational/untranscribed-training
unsup_data_dir=${AMHARIC_ROOT}/conversational/untranscribed-training
unsup_data_list=./conf/lists/307-amharic/untranscribed-training.list
unsup_nj=32


lexicon_file=/export/babel/data/307-amharic/IARPA-babel307b-v1.0b-build/BABEL_OP3_307/conversational/reference_materials/lexicon.txt
lexicon_file=${AMHARIC_ROOT}/conversational/reference_materials/lexicon.txt
lexiconFlags="--romanized --oov <unk>"


Expand Down
Loading