Skip to content

Conversation

@aevernon
Copy link
Contributor

I tested this patch against fisher_eng_tr_sp_LDC2004S13.zip, which I downloaded from LDC today.

It looks like original committer mistyped the part number since LDC describes this dataset as "Fisher English Training Speech Data, Part 1."

ln -s $dir/$subdir data/local/data/links
else
new_style_subdir=$(echo $subdir | sed s/fe_03_p2_sph/fisher_eng_tr_sp_d/)
new_style_subdir=$(echo $subdir | sed s/fe_03_p1_sph/fisher_eng_tr_sp_d/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sikoried does this look right?

@jtrmal
Copy link
Contributor

jtrmal commented Nov 22, 2016

my experience with a diferent format of the releases was that the best way
is to use find to either find the files or at least find the audio
directory or somehting...

i.e. using the corpora directory only as a starting point to find say
directory "audio"
y.

On Tue, Nov 22, 2016 at 5:31 PM, Daniel Povey [email protected]
wrote:

@danpovey commented on this pull request.

In egs/aspire/s5/local/fisher_data_prep.sh
#1209 (review):

@@ -52,7 +52,7 @@ for subdir in fe_03_p1_sph1 fe_03_p1_sph3 fe_03_p1_sph5 fe_03_p1_sph7
found_subdir=true
ln -s $dir/$subdir data/local/data/links
else

  •  new_style_subdir=$(echo $subdir | sed s/fe_03_p2_sph/fisher_eng_tr_sp_d/)
    
  •  new_style_subdir=$(echo $subdir | sed s/fe_03_p1_sph/fisher_eng_tr_sp_d/)
    

@sikoried https://github.com/sikoried does this look right?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1209 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKisXycucttz00hChcJ-GYWzh4DNpDwpks5rA21FgaJpZM4K6BGD
.

@jtrmal
Copy link
Contributor

jtrmal commented Nov 22, 2016

I mean the command find
y.

On Tue, Nov 22, 2016 at 6:23 PM, Jan Trmal [email protected] wrote:

my experience with a diferent format of the releases was that the best way
is to use find to either find the files or at least find the audio
directory or somehting...

i.e. using the corpora directory only as a starting point to find say
directory "audio"
y.

On Tue, Nov 22, 2016 at 5:31 PM, Daniel Povey [email protected]
wrote:

@danpovey commented on this pull request.

In egs/aspire/s5/local/fisher_data_prep.sh
#1209 (review):

@@ -52,7 +52,7 @@ for subdir in fe_03_p1_sph1 fe_03_p1_sph3 fe_03_p1_sph5 fe_03_p1_sph7
found_subdir=true
ln -s $dir/$subdir data/local/data/links
else

  •  new_style_subdir=$(echo $subdir | sed s/fe_03_p2_sph/fisher_eng_tr_sp_d/)
    
  •  new_style_subdir=$(echo $subdir | sed s/fe_03_p1_sph/fisher_eng_tr_sp_d/)
    

@sikoried https://github.com/sikoried does this look right?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1209 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKisXycucttz00hChcJ-GYWzh4DNpDwpks5rA21FgaJpZM4K6BGD
.

@danpovey
Copy link
Contributor

I suspect that your patch is not right. That script takes multiple LDC
directories, and some do provide part 2 of the data. It's an old script
and has been used a lot. You can describe the problem you are trying to
solve, that might be better.

On Tue, Nov 22, 2016 at 6:24 PM, jtrmal [email protected] wrote:

I mean the command find
y.

On Tue, Nov 22, 2016 at 6:23 PM, Jan Trmal [email protected] wrote:

my experience with a diferent format of the releases was that the best
way
is to use find to either find the files or at least find the audio
directory or somehting...

i.e. using the corpora directory only as a starting point to find say
directory "audio"
y.

On Tue, Nov 22, 2016 at 5:31 PM, Daniel Povey [email protected]
wrote:

@danpovey commented on this pull request.

In egs/aspire/s5/local/fisher_data_prep.sh
<#1209 (review)
:

@@ -52,7 +52,7 @@ for subdir in fe_03_p1_sph1 fe_03_p1_sph3
fe_03_p1_sph5 fe_03_p1_sph7
found_subdir=true
ln -s $dir/$subdir data/local/data/links
else

  • new_style_subdir=$(echo $subdir | sed s/fe_03_p2_sph/fisher_eng_tr_
    sp_d/)
  • new_style_subdir=$(echo $subdir | sed s/fe_03_p1_sph/fisher_eng_tr_
    sp_d/)

@sikoried https://github.com/sikoried does this look right?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1209 (review)
,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKisXycucttz00hChcJ-
GYWzh4DNpDwpks5rA21FgaJpZM4K6BGD>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1209 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu7OngVysOb3R5p5gPmS3GH53lKlgks5rA3mSgaJpZM4K6BGD
.

@danpovey
Copy link
Contributor

@aevernon, look at the usage message of the script more carefully. It expects 4 different LDC corpora-- or a single directory where the contents of all of them reside. Closing the PR.

@danpovey danpovey closed this Nov 22, 2016
@aevernon
Copy link
Contributor Author

Please consider re-opening this ticket. The details below make me think my patch is correct.

Per the usage message, I pass the four directories to fisher_data_prep.sh:

Steps to Reproduce

cd /kaldi/egs/aspire/s5
. ./cmd.sh
. ./path.sh
mfccdir=`pwd`/mfcc
set -e
local/fisher_data_prep.sh /export/corpora3/LDC/LDC2004T19 /export/corpora3/LDC/LDC2005T19 \
   /export/corpora3/LDC/LDC2004S13 /export/corpora3/LDC/LDC2005S13

Observed Behavior

local/fisher_data_prep.sh: could not find the subdirectory fe_03_p1_sph1 in any of /export/corpora3/LDC/LDC2004T19 /export/corpora3/LDC/LDC2005T19 /export/corpora3/LDC/LDC2004S13 /export/corpora3/LDC/LDC2005S13

Contents of my four corpora directories to show I have extracted them correctly:

ls /export/corpora3/LDC/LDC2004T19

fe_03_p1_tran

ls /export/corpora3/LDC/LDC2005T19

fe_03_p2_tran

ls /export/corpora3/LDC/LDC2004S13

fisher_eng_tr_sp_d1 fisher_eng_tr_sp_d3 fisher_eng_tr_sp_d5 fisher_eng_tr_sp_d7
fisher_eng_tr_sp_d2 fisher_eng_tr_sp_d4 fisher_eng_tr_sp_d6

ls /export/corpora3/LDC/LDC2005S13

fe_03_p2_sph1 fe_03_p2_sph2 fe_03_p2_sph3 fe_03_p2_sph4 fe_03_p2_sph5 fe_03_p2_sph6 fe_03_p2_sph7

Determining correct location of fe_03_p1_sph1:

find /export/corpora3 -name fe_03_p1_sph1

/export/corpora3/LDC/LDC2004S13/fisher_eng_tr_sp_d1/fe_03_p1_sph1

find shows that the patch is correct (at least for LDC data that I downloaded this month.)

I agree with @jtrmal that using find would be more robust. The intention of this patch was to be a quick fix for others who might try to run this recipe.

@danpovey
Copy link
Contributor

danpovey commented Nov 28, 2016 via email

@danpovey
Copy link
Contributor

reopening...

@danpovey
Copy link
Contributor

Oh, I can't reopen because you deleted the repo that the PR was pointing to. Is it possible to recreate that repo?

@aevernon
Copy link
Contributor Author

aevernon commented Nov 28, 2016

I've re-created the repository.

@danpovey
Copy link
Contributor

github still won't let me reopen. Would you mind creating a new PR? I'll merge right away.

@aevernon
Copy link
Contributor Author

Created as #1223.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants