-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Look in right location for new style subdirectories #1209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ln -s $dir/$subdir data/local/data/links | ||
| else | ||
| new_style_subdir=$(echo $subdir | sed s/fe_03_p2_sph/fisher_eng_tr_sp_d/) | ||
| new_style_subdir=$(echo $subdir | sed s/fe_03_p1_sph/fisher_eng_tr_sp_d/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sikoried does this look right?
|
my experience with a diferent format of the releases was that the best way i.e. using the corpora directory only as a starting point to find say On Tue, Nov 22, 2016 at 5:31 PM, Daniel Povey [email protected]
|
|
I mean the command On Tue, Nov 22, 2016 at 6:23 PM, Jan Trmal [email protected] wrote:
|
|
I suspect that your patch is not right. That script takes multiple LDC On Tue, Nov 22, 2016 at 6:24 PM, jtrmal [email protected] wrote:
|
|
@aevernon, look at the usage message of the script more carefully. It expects 4 different LDC corpora-- or a single directory where the contents of all of them reside. Closing the PR. |
|
Please consider re-opening this ticket. The details below make me think my patch is correct. Per the usage message, I pass the four directories to Steps to Reproducecd /kaldi/egs/aspire/s5
. ./cmd.sh
. ./path.sh
mfccdir=`pwd`/mfcc
set -e
local/fisher_data_prep.sh /export/corpora3/LDC/LDC2004T19 /export/corpora3/LDC/LDC2005T19 \
/export/corpora3/LDC/LDC2004S13 /export/corpora3/LDC/LDC2005S13Observed Behavior
Contents of my four corpora directories to show I have extracted them correctly: ls /export/corpora3/LDC/LDC2004T19
ls /export/corpora3/LDC/LDC2005T19
ls /export/corpora3/LDC/LDC2004S13
ls /export/corpora3/LDC/LDC2005S13
Determining correct location of fe_03_p1_sph1: find /export/corpora3 -name fe_03_p1_sph1
I agree with @jtrmal that using |
|
Oh OK. I did not realize you were using all 4 directories. I guess what
you have is the 'newer' data, and we have the older data which is why we
didn't see a problem.
I'll merge.
…On Mon, Nov 28, 2016 at 11:14 AM, Albert Vernon ***@***.***> wrote:
Please consider re-opening this ticket. The details below make me think my
patch is correct.
Per the usage message, I pass the four directories to fisher_data_prep.sh:
Steps to Reproduce
cd /kaldi/egs/aspire/s5. ./cmd.sh. ./path.sh
mfccdir=`pwd`/mfccset -elocal/fisher_data_prep.sh /export/corpora3/LDC/LDC2004T19 /export/corpora3/LDC/LDC2005T19 \
/export/corpora3/LDC/LDC2004S13 /export/corpora3/LDC/LDC2005S13
Observed Behavior
local/fisher_data_prep.sh: could not find the subdirectory fe_03_p1_sph1
in any of /export/corpora3/LDC/LDC2004T19 /export/corpora3/LDC/LDC2005T19
/export/corpora3/LDC/LDC2004S13 /export/corpora3/LDC/LDC2005S13
*Contents of my four corpora directories to show I have extracted them
correctly:*
ls /export/corpora3/LDC/LDC2004T19
fe_03_p1_tran
ls /export/corpora3/LDC/LDC2005T19
fe_03_p2_tran
ls /export/corpora3/LDC/LDC2004S13
fisher_eng_tr_sp_d1 fisher_eng_tr_sp_d3 fisher_eng_tr_sp_d5
fisher_eng_tr_sp_d7
fisher_eng_tr_sp_d2 fisher_eng_tr_sp_d4 fisher_eng_tr_sp_d6
ls /export/corpora3/LDC/LDC2005S13
fe_03_p2_sph1 fe_03_p2_sph2 fe_03_p2_sph3 fe_03_p2_sph4 fe_03_p2_sph5
fe_03_p2_sph6 fe_03_p2_sph7
*Determining correct location of fe_03_p1_sph1*:
find /export/corpora3 -name fe_03_p1_sph1
/export/corpora3/LDC/LDC2004S13/fisher_eng_tr_sp_d1/fe_03_p1_sph1
find shows that the patch is correct (at least for LDC data that I
downloaded this month.)
I agree with @jtrmal <https://github.com/jtrmal> that using find would be
more robust. The intention of this patch was to be a quick fix for others
who might try to run this recipe.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1209 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu1jnBER20zCS2bv17WoxcxYrpfH6ks5rCv3hgaJpZM4K6BGD>
.
|
|
reopening... |
|
Oh, I can't reopen because you deleted the repo that the PR was pointing to. Is it possible to recreate that repo? |
|
I've re-created the repository. |
|
github still won't let me reopen. Would you mind creating a new PR? I'll merge right away. |
|
Created as #1223. |
I tested this patch against
fisher_eng_tr_sp_LDC2004S13.zip, which I downloaded from LDC today.It looks like original committer mistyped the part number since LDC describes this dataset as "Fisher English Training Speech Data, Part 1."