changes for GALE mandarin setup #1207

jtrmal · 2016-11-22T16:04:24Z

I did some changes in order to allow me to train on all data.
depends on @naxingyu and @danpovey if they decide merge or if a better strategy would be only cherry-pick some files or changes.
y.

danpovey · 2016-11-22T21:59:09Z

I don't like the weird RESULTS filename.

jtrmal · 2016-11-22T22:09:42Z

that's how these filenames have been always generated for gale setups -- see arabic gale also...
I wanted to provide some numbers for @naxingyu as a reference.
I don't think it matters that much, unless you wanna merge it -- I'd suggest waiting how @naxingyu feels better solution for him.

danpovey · 2016-11-22T22:15:03Z

I guess there's no hurry to merge, but do you see a reason why we would not
merge this? Does it improve results?
I don't like the idea of having a separate convention for writing results
for particular setups. I must have not reviewed the original commits for
that setup very well. I'd rather just have it as a RESULTS file, and
remove any old results files. People can figure out from the git log when
and who committed it, if they need.

On Tue, Nov 22, 2016 at 5:09 PM, jtrmal [email protected] wrote:

that's how these filenames have been always generated for gale setups --
see arabic gale also...
I wanted to provide some numbers for @naxingyu
https://github.com/naxingyu as a reference.
I don't think it matters that much, unless you wanna merge it -- I'd
suggest waiting how @naxingyu https://github.com/naxingyu feels better
solution for him.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu10pCCrfuY2J316-T2SmdwXVY8Qrks5rA2gogaJpZM4K5nPZ
.

jtrmal · 2016-11-22T22:18:19Z

I can, of course, change it -- no issue in that. Maybe the original idea was that the RESULTS file won't be overwritten by people running the recipes and generating their own results and if the results should be used as a reference, they should be manually renamed to RESULTS -- in that case I'm at fault, I wasn't sure of the reasons.
y.

naxingyu · 2016-11-23T02:33:36Z

Thank you Yenda for the updated numbers. I was curious why there wasn't
a RESULT file, now it explains. There are several issues with the
original gale_mandarin recipe. It details WER of "report" and
"conversation" but the references don't exist. Maybe it was not
finished. And it confuses "test" with dev. But I saw that Yenda has
reshaped the data structure so it should be addressed by now. Some of
the example scripts are out-dated, using unavailable options of the
steps scripts. I'll see if there is anything I can do with this, but I
think we should agree on a suitable data structure. What Yenda proposes
seems reasonable.

Actually, just think out loud, there is no such thing as 'eval2000' and
'rt03' for mandarin/chinese. All the test sets are just dev sets. Is
there a possibility that we find a suitable chinese test set?

Xingyu

On 2016/11/23 6:09, jtrmal wrote:

that's how these filenames have been always generated for gale setups
-- see arabic gale also...
I wanted to provide some numbers for @naxingyu
https://github.com/naxingyu as a reference.
I don't think it matters that much, unless you wanna merge it -- I'd
suggest waiting how @naxingyu https://github.com/naxingyu feels
better solution for him.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxJVR8Vw4xnn7bsQeFBSjCdEcqA9Pks5rA2gogaJpZM4K5nPZ.

danpovey · 2016-11-23T02:39:32Z

Xingyu, I'd say go ahead and do what you think is best, taking into account
Yenda's changes. This recipe is not properly maintained by anyone else;
don't assume that whatever you see there makes sense. When and if you
think I should merge Yenda's PR, let me know. I don't have time to think
about this much.

On Tue, Nov 22, 2016 at 9:33 PM, Xingyu Na [email protected] wrote:

Thank you Yenda for the updated numbers. I was curious why there wasn't
a RESULT file, now it explains. There are several issues with the
original gale_mandarin recipe. It details WER of "report" and
"conversation" but the references don't exist. Maybe it was not
finished. And it confuses "test" with dev. But I saw that Yenda has
reshaped the data structure so it should be addressed by now. Some of
the example scripts are out-dated, using unavailable options of the
steps scripts. I'll see if there is anything I can do with this, but I
think we should agree on a suitable data structure. What Yenda proposes
seems reasonable.

Actually, just think out loud, there is no such thing as 'eval2000' and
'rt03' for mandarin/chinese. All the test sets are just dev sets. Is
there a possibility that we find a suitable chinese test set?

Xingyu

On 2016/11/23 6:09, jtrmal wrote:

that's how these filenames have been always generated for gale setups
-- see arabic gale also...
I wanted to provide some numbers for @naxingyu
https://github.com/naxingyu as a reference.
I don't think it matters that much, unless you wanna merge it -- I'd
suggest waiting how @naxingyu https://github.com/naxingyu feels
better solution for him.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment),
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ADKpxJVR8Vw4xnn7bsQeFBSjCdEcqA9Pks5rA2gogaJpZM4K5nPZ>.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5mFCQyszhxzG03Z5rYMhJO1xR_Wks5rA6YCgaJpZM4K5nPZ
.

naxingyu · 2016-11-23T02:44:56Z

OK.

On 2016/11/23 10:39, Daniel Povey wrote:

Xingyu, I'd say go ahead and do what you think is best, taking into
account
Yenda's changes. This recipe is not properly maintained by anyone else;
don't assume that whatever you see there makes sense. When and if you
think I should merge Yenda's PR, let me know. I don't have time to think
about this much.

On Tue, Nov 22, 2016 at 9:33 PM, Xingyu Na [email protected]
wrote:

Thank you Yenda for the updated numbers. I was curious why there wasn't
a RESULT file, now it explains. There are several issues with the
original gale_mandarin recipe. It details WER of "report" and
"conversation" but the references don't exist. Maybe it was not
finished. And it confuses "test" with dev. But I saw that Yenda has
reshaped the data structure so it should be addressed by now. Some of
the example scripts are out-dated, using unavailable options of the
steps scripts. I'll see if there is anything I can do with this, but I
think we should agree on a suitable data structure. What Yenda proposes
seems reasonable.

Actually, just think out loud, there is no such thing as 'eval2000' and
'rt03' for mandarin/chinese. All the test sets are just dev sets. Is
there a possibility that we find a suitable chinese test set?

Xingyu

On 2016/11/23 6:09, jtrmal wrote:

that's how these filenames have been always generated for gale setups
-- see arabic gale also...
I wanted to provide some numbers for @naxingyu
https://github.com/naxingyu as a reference.
I don't think it matters that much, unless you wanna merge it -- I'd
suggest waiting how @naxingyu https://github.com/naxingyu feels
better solution for him.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment),
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ADKpxJVR8Vw4xnn7bsQeFBSjCdEcqA9Pks5rA2gogaJpZM4K5nPZ>.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub

#1207 (comment),
or mute
the thread

https://github.com/notifications/unsubscribe-auth/ADJVu5mFCQyszhxzG03Z5rYMhJO1xR_Wks5rA6YCgaJpZM4K5nPZ
.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxJTVXKxEzjNe3an9ZnbNwprQPy9nks5rA6dogaJpZM4K5nPZ.

naxingyu · 2016-11-23T13:11:35Z

@jtrmal I don't see where you use all data to train. It's still running on LDC2013S08 and LDC2013T20.

jtrmal · 2016-11-23T13:26:40Z

Let me check, maybe I didn't commit the changes to run.sh

On Nov 23, 2016 8:11 AM, "Xingyu Na" [email protected] wrote:

@jtrmal https://github.com/jtrmal I don't see where you use all data to
train. It's still running on LDC2013S08 and LDC2013T20.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKisX0iVGq2UY6WibNGMW5zUoID1K6itks5rBDuKgaJpZM4K5nPZ
.

jtrmal · 2016-11-23T13:44:27Z

I commit the run.sh and path.sh. Also renamed the RESULTS file. Sorry about
that.
y.

On Wed, Nov 23, 2016 at 8:26 AM, Jan Trmal [email protected] wrote:

Let me check, maybe I didn't commit the changes to run.sh

On Nov 23, 2016 8:11 AM, "Xingyu Na" [email protected] wrote:

@jtrmal https://github.com/jtrmal I don't see where you use all data
to train. It's still running on LDC2013S08 and LDC2013T20.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1207 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKisX0iVGq2UY6WibNGMW5zUoID1K6itks5rBDuKgaJpZM4K5nPZ
.

naxingyu · 2016-11-24T09:25:59Z

egs/gale_mandarin/s5/local/gale_prep_dict.sh

-    exit 1
-  fi
-fi



Why is this removed? g2p.py is used later.

There should be a helpful error message if g2p is not found.

jtrmal · 2016-11-24T15:14:56Z

we have an installation script in tools/extras/ It should be checking for availability of g2p.py and printing helpful error message in case the g2p is not found, though -- my bad y.

…

On Thu, Nov 24, 2016 at 4:51 AM, Xingyu Na ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/gale_mandarin/s5/local/gale_prep_dict.sh <#1207 (review)>: > -export PYTHONPATH=$PYTHONPATH:`pwd`/tools/g2p/lib/python${pyver}/site-packages -if [ ! -f tools/g2p/lib/python${pyver}/site-packages/g2p.py ]; then - echo "--- Downloading Sequitur G2P ..." - echo "NOTE: it assumes that you have Python, NumPy and SWIG installed on your system!" - wget -P tools http://www-i6.informatik.rwth-aachen.de/web/Software/g2p-r1668.tar.gz - tar xf tools/g2p-r1668.tar.gz -C tools - cd tools/g2p - echo '#include <cstdio>' >> Utility.hh # won't compile on my system w/o this "patch" - python setup.py build - python setup.py install --prefix=. - cd ../.. - if [ ! -f tools/g2p/lib/python${pyver}/site-packages/g2p.py ]; then - echo "Sequitur G2P is not found - installation failed?" - exit 1 - fi -fi Why is this removed? g2p.py is used later. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1207 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX0LFXjrEKOT5tBk7xQ_mgD6ljb9fks5rBV4fgaJpZM4K5nPZ> .

naxingyu

Besides the comments, the RESULT file should be renamed. And did you plan to commit the changes you made on the scoring script?

naxingyu · 2016-11-30T08:08:25Z

egs/gale_mandarin/s5/local/gale_prep_dict.sh

-    exit 1
-  fi
-fi



There should be a helpful error message if g2p is not found.

naxingyu · 2016-11-30T08:09:49Z

egs/gale_mandarin/s5/run.sh


+wait
 local/nnet/run_dnn.sh



in local/nnet/run_dnn.sh, there is a invalid option to steps/nnet/train.sh (--use-gpu-id) and a missing done near the end of the script.

naxingyu · 2016-11-30T08:12:35Z

egs/gale_mandarin/s5/local/split_wer_per_corpus.sh

@@ -0,0 +1,61 @@
+#!/bin/bash


This new script is not called in run.sh. We may think of deprecate the original split_wer.sh and rename this one as split_wer.sh. That one was errornous anyway.

danpovey · 2016-11-30T20:31:08Z

Xingyu, I don't think Yenda was planning to do much more work on this. I think what's best is if you merge changes from his PR as you like, fix it in the way that you think best, and then make your own pull request, whih we can merge.

…

On Wed, Nov 30, 2016 at 3:26 AM, Xingyu Na ***@***.***> wrote: ***@***.**** commented on this pull request. Besides the comments, the RESULT file should be renamed. And did you plan to commit the changes you made on the scoring script? ------------------------------ In egs/gale_mandarin/s5/local/gale_prep_dict.sh <#1207 (review)>: > -export PYTHONPATH=$PYTHONPATH:`pwd`/tools/g2p/lib/python${pyver}/site-packages -if [ ! -f tools/g2p/lib/python${pyver}/site-packages/g2p.py ]; then - echo "--- Downloading Sequitur G2P ..." - echo "NOTE: it assumes that you have Python, NumPy and SWIG installed on your system!" - wget -P tools http://www-i6.informatik.rwth-aachen.de/web/Software/g2p-r1668.tar.gz - tar xf tools/g2p-r1668.tar.gz -C tools - cd tools/g2p - echo '#include <cstdio>' >> Utility.hh # won't compile on my system w/o this "patch" - python setup.py build - python setup.py install --prefix=. - cd ../.. - if [ ! -f tools/g2p/lib/python${pyver}/site-packages/g2p.py ]; then - echo "Sequitur G2P is not found - installation failed?" - exit 1 - fi -fi There should be a helpful error message if g2p is not found. ------------------------------ In egs/gale_mandarin/s5/run.sh <#1207 (review)>: > data/dev exp/sgmm_5a/decode exp/sgmm_5a_mmi_onlyRescoreb0.1/decode$n done +wait local/nnet/run_dnn.sh in local/nnet/run_dnn.sh, there is a invalid option to steps/nnet/train.sh (--use-gpu-id) and a missing done near the end of the script. ------------------------------ In egs/gale_mandarin/s5/local/split_wer_per_corpus.sh <#1207 (review)>: > @@ -0,0 +1,61 @@ +#!/bin/bash This new script is not called in run.sh. We may think of deprecate the original split_wer.sh and rename this one as split_wer.sh. That one was errornous anyway. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1207 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu_Actf0WZSLERSBp8adcybCGZatnks5rDTMdgaJpZM4K5nPZ> .

naxingyu · 2016-12-01T05:47:53Z

OK. 在 2016/12/1 4:31, Daniel Povey 写道:

…

Xingyu, I don't think Yenda was planning to do much more work on this. I think what's best is if you merge changes from his PR as you like, fix it in the way that you think best, and then make your own pull request, whih we can merge. On Wed, Nov 30, 2016 at 3:26 AM, Xingyu Na ***@***.***> wrote: > ***@***.**** commented on this pull request. > > Besides the comments, the RESULT file should be renamed. And did you plan > to commit the changes you made on the scoring script? > ------------------------------ > > In egs/gale_mandarin/s5/local/gale_prep_dict.sh > <#1207 (review)>: > > > -export PYTHONPATH=$PYTHONPATH:`pwd`/tools/g2p/lib/python${pyver}/site-packages > -if [ ! -f tools/g2p/lib/python${pyver}/site-packages/g2p.py ]; then > - echo "--- Downloading Sequitur G2P ..." > - echo "NOTE: it assumes that you have Python, NumPy and SWIG installed on your system!" > - wget -P tools http://www-i6.informatik.rwth-aachen.de/web/Software/g2p-r1668.tar.gz > - tar xf tools/g2p-r1668.tar.gz -C tools > - cd tools/g2p > - echo '#include <cstdio>' >> Utility.hh # won't compile on my system w/o this "patch" > - python setup.py build > - python setup.py install --prefix=. > - cd ../.. > - if [ ! -f tools/g2p/lib/python${pyver}/site-packages/g2p.py ]; then > - echo "Sequitur G2P is not found - installation failed?" > - exit 1 > - fi > -fi > > > There should be a helpful error message if g2p is not found. > ------------------------------ > > In egs/gale_mandarin/s5/run.sh > <#1207 (review)>: > > > data/dev exp/sgmm_5a/decode exp/sgmm_5a_mmi_onlyRescoreb0.1/decode$n > done > > +wait > local/nnet/run_dnn.sh > > > in local/nnet/run_dnn.sh, there is a invalid option to steps/nnet/train.sh > (--use-gpu-id) and a missing done near the end of the script. > ------------------------------ > > In egs/gale_mandarin/s5/local/split_wer_per_corpus.sh > <#1207 (review)>: > > > @@ -0,0 +1,61 @@ > +#!/bin/bash > > This new script is not called in run.sh. We may think of deprecate the > original split_wer.sh and rename this one as split_wer.sh. That one was > errornous anyway. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1207 (review)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADJVu_Actf0WZSLERSBp8adcybCGZatnks5rDTMdgaJpZM4K5nPZ> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1207 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADKpxGKmUWKzXMXbs5iDaaik6W9XJg91ks5rDd0QgaJpZM4K5nPZ>.

jtrmal · 2016-12-08T11:35:31Z

This is being handled by #1253, so closing...

changes for GALE mandarin setup

1c02d5a

jtrmal mentioned this pull request Nov 22, 2016

Update Gale Chinese recipe #1163

Closed

a couple of files forgotten the last time

53df780

naxingyu reviewed Nov 24, 2016

View reviewed changes

naxingyu reviewed Nov 30, 2016

View reviewed changes

This was referenced Dec 7, 2016

Gale mandarin fix #1252

Closed

Gale mandarin fix #1253

Merged

jtrmal closed this Dec 8, 2016

jtrmal deleted the gale_recipe_fix branch February 10, 2017 03:26


		wait
		local/nnet/run_dnn.sh

changes for GALE mandarin setup #1207

changes for GALE mandarin setup #1207

Uh oh!

Conversation

jtrmal commented Nov 22, 2016

Uh oh!

danpovey commented Nov 22, 2016

Uh oh!

jtrmal commented Nov 22, 2016

Uh oh!

danpovey commented Nov 22, 2016

Uh oh!

jtrmal commented Nov 22, 2016

Uh oh!

naxingyu commented Nov 23, 2016

Uh oh!

danpovey commented Nov 23, 2016

Uh oh!

naxingyu commented Nov 23, 2016

Uh oh!

naxingyu commented Nov 23, 2016

Uh oh!

jtrmal commented Nov 23, 2016

Uh oh!

jtrmal commented Nov 23, 2016

Uh oh!

naxingyu Nov 24, 2016

Choose a reason for hiding this comment

Uh oh!

naxingyu Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

jtrmal commented Nov 24, 2016 via email

Uh oh!

naxingyu left a comment

Choose a reason for hiding this comment

Uh oh!

naxingyu Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

naxingyu Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

naxingyu Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

danpovey commented Nov 30, 2016 via email

Uh oh!

naxingyu commented Dec 1, 2016 via email

Uh oh!

jtrmal commented Dec 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants