Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
28364d8
update nnet/chain tdnn runner
Apr 17, 2018
8fb613e
fix ivec dir in chain system
Apr 17, 2018
12aeb50
nj mods on chain, proved to be ran
Apr 18, 2018
7ed676d
remove affix default option
Apr 18, 2018
620e6cf
create tuning directory
professorsearcher Apr 20, 2018
6a56fea
same for chain
professorsearcher Apr 20, 2018
1dcd3dc
minor fix on stage for run_ivector_common.sh
professorsearcher Apr 20, 2018
39df270
add exp 2b for chain: list struc for frames_per_eg
professorsearcher Apr 20, 2018
79fa7ae
fix typos and change ivector transform from lda to pca
professorsearcher Apr 20, 2018
ad4a0a9
changes in terms of arameters for training, referencing wsj
professorsearcher Apr 20, 2018
1080b34
add wer computation script
professorsearcher Apr 21, 2018
f108e2d
some fixes for wer script
professorsearcher Apr 21, 2018
57984fc
delete trial comments
professorsearcher Apr 22, 2018
2ad7775
add numbers, comments and minor changes on wer computing script
Apr 25, 2018
1a29c64
add results and more comments
Apr 25, 2018
c97a8c2
space
Apr 25, 2018
f9cd305
minor comment fix
Apr 25, 2018
e70aed9
delete scripts that are not useful
professorsearcher Apr 25, 2018
05abf10
add 2b runner script, which references 7p from swbd
professorsearcher Apr 25, 2018
1d8fd41
change dimension to suggested
professorsearcher Apr 25, 2018
b0f37f5
change number of jobs
professorsearcher Apr 25, 2018
ce82b67
Merge branch 'master' of https://github.com/kaldi-asr/kaldi
Apr 25, 2018
9ed6c27
intermediate changes for exp
Apr 26, 2018
1a7fa0e
remove copyrights in tdnn runners
Apr 26, 2018
5aac8e0
add separate wer calculation scripts for nnet3 and chain, for chain w…
Apr 27, 2018
887d8a4
style and grep fix
Apr 27, 2018
d9d7bc5
Merge branch 'master' of https://github.com/underdogliu/kaldi
Apr 27, 2018
5aed27c
update results from chain and change softlink, deleting files from chain
Apr 28, 2018
88d4937
delete 1a which has been deprecated from nnet3 directory
Apr 28, 2018
3380e4c
update manual in compre_wer.sh
Apr 28, 2018
bc1e9c1
minor change
Apr 28, 2018
baf1d60
Merge branch 'master' of https://github.com/kaldi-asr/kaldi
Apr 30, 2018
be9d0c1
Merge branch 'master' of https://github.com/kaldi-asr/kaldi
Jun 26, 2018
7c1d822
sync; mark all places that needs modification and cleaning
Jun 26, 2018
efa10dc
more modifications and style fixing
Jun 26, 2018
bc0545d
foo
Jun 26, 2018
b50dbe0
change mode name and update recipe with comments
Jun 26, 2018
32d311e
fix comments
Jun 26, 2018
e6608d5
more fixes on scripts&comments
Jun 28, 2018
cfb012b
add softlinks about scripts
Jun 28, 2018
09f4423
acquire mode name to chain script caller
Jun 28, 2018
2685ead
add RESULTS file. will have gmm results from two modes
Jun 28, 2018
a477b15
result updated
Jun 29, 2018
4dad44d
more comments
Jun 29, 2018
3e23ce0
exp name fix
Jun 30, 2018
67e5fe3
update results and add info from steps/info/chain_dir_info.pl
Jun 30, 2018
3b52cc0
small fix on dev&test nj
Jul 2, 2018
835132c
more foo fix
Jul 2, 2018
e672bc3
change 'set -euxo pipefail' to 'set -e'
Jul 4, 2018
fc742f7
remove mode options; change cmd to take advantage of gridengine
Jul 5, 2018
0012458
remove results from old scripts(simple mode)
Jul 6, 2018
5815dbc
change stage option
professorsearcher Jul 6, 2018
4ef0f93
merge
Jul 7, 2018
198b8cb
Merge branch 'aishell2_release' of https://github.com/underdogliu/kal…
Jul 7, 2018
085e88c
manual merge
professorsearcher Jul 27, 2018
0f89047
manual merge
professorsearcher Sep 15, 2018
873cbbd
updates on README
professorsearcher Sep 15, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions egs/aishell2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# AISHELL-2

AISHELL-2 is by far the largest free speech corpus available for Mandarin ASR research.
## 1. DATA
### Training data
* 1000 hours of speech data (around 1 million utterances)
* 1991 speakers (845 male and 1146 female)
* clean recording environment (studio or quiet living room)
* read speech
* reading prompts from various domain: entertainment, finance, technology, sports, control command, place of interest etc.
* near field recording via 3 parallel channels (iOS, Android, Microphone).
* iOS data is free for non-commercial research and education use (e.g. universities and non-commercial institutes)

### Evaluation data:
Currently we release AISHELL2-2018A-EVAL, containing:
* dev: 2500 utterances from 5 speakers
* test: 5000 utterances from 10 speakers

Both sets are available across the three channel conditions.

One of interest can download the sets from [here](http://www.aishelltech.com/aishell_eval). Note that we may update and release other evaluation sets on the website later, targeting on different applications and senarios.

## 2. RECIPE
Based on Kaldi standard system, AISHELL-2 provides a self-contained Mandarin ASR recipe, with:
* a word segmentation module, which is a must-have component for Chinese ASR systems
* an open-sourced Mandarin lexicon (DaCiDian, open-sourced at [here](https://github.com/aishell-foundation/DaCiDian))
* Simplified GMM training & alignment generating recipe (we stopped at speaker independent stage)
* LFMMI TDNN training and decoding recipe

# REFERENCE
We released a [paper on Arxiv](https://arxiv.org/abs/1808.10583) on a more detailed description about the corpus with some preliminary resulting numbers. If one would like to use AISHELL-2 in experiments, please cite the paper as below:
```
@ARTICLE{aishell2,
author = {{Du}, J. and {Na}, X. and {Liu}, X. and {Bu}, H.},
title = "{AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale}",
journal = {ArXiv},
eprint = {1808.10583},
primaryClass = "cs.CL",
year = 2018,
month = Aug,
}
```

# APPLY FOR DATA/CONTACT
AISHELL foundation is a non-profit online organization, with members from speech industry and research institutes.

We hope AISHELL-2 corpus and recipe could be beneficial to the entire speech community.

Depends on your location and internet speed, we distribute the corpus in two ways:
* hard-disk delivery
* cloud-disk downloading

To apply for AISHELL-2 corpus for free, you need to fill in a very simple application form, confirming that:
* university department / educational institute information has been fully provided
* only for non-commercial research / education use

AISHELL-foundation covers all data distribution fees (including the corpus, hard-disk cost etc)

Data re-distribution inside your university department is OK for convenience. However, users are not supposed to re-distribute the data to other universities or educational institutes.

To get the application form, or you come across any problem with the recipe, contact us via:

[email protected]

50 changes: 0 additions & 50 deletions egs/aishell2/README.txt

This file was deleted.