-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Recipe for Tunisian_MSA corpus. #2722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ything under data/local
…l number of jobs to 1.
…-chunk-per-minibatch=64,32,16
…l number of jobs to 1.
…nstead of data/lang in mkgraph command
|
It looks like I was not able to resolve the conflicts. I accept help :) |
|
OK @xiaohui-zhang and @huangruizhe will look at it. |
|
The conflicts seem to be due to changes to files in the heroico recipe that don't seem to be related to the Tunisian MSA recipe. |
|
done in #2725 |
…oded Arabic to utf8 encoded arabic.
…buckwalter to utf8.
| @@ -0,0 +1,10 @@ | |||
| #!/bin/bash | |||
|
|
|||
| cut -d " " -f 1 qcri.txt > qcri_words_buckwalter.txt | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be
cat qcri.txt | tail -n +4 | cut -d " " -f 1 > qcri_words_buckwalter.txt
cat qcri.txt | tail -n +4 | cut -d " " -f 2 > qcri_prons.txt
Otherwise lines like "# Copyright" will be included.
|
I was already doing this in the download script:
egs/Tunisian_msa/s5/local/qcri_lexicon_download.sh
I wasn't overwriting the unzipped text file with the header removed,
so I fixed that.
J
…On 9/20/18, Xiaohui Zhang ***@***.***> wrote:
xiaohui-zhang commented on this pull request.
> @@ -0,0 +1,10 @@
+#!/bin/bash
+
+cut -d " " -f 1 qcri.txt > qcri_words_buckwalter.txt
this should probably be
cat qcri.txt | tail -n +4 | cut -d " " -f 1 > qcri_words_buckwalter.txt
cat qcri.txt | tail -n +4 | cut -d " " -f 2 > qcri_prons.txt
Otherwise lines like "# Copyright" will be included.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#2722 (review)
|
| #!/bin/bash | ||
|
|
||
| cut -d " " -f 1 qcri.txt > qcri_words_buckwalter.txt | ||
| cut -d " " -f 2 qcri.txt > qcri_prons.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the option -f 2 should be -f 2-. I realized this after I saw totally wrong decoding results...
|
I'm closing this PR because I believe we merged it indirectly via #2725. If there are other changes you want us to merge, please let us know. |
|
I removed 2 scripts that are not used and 2 config files. And I added a copyright.
conf/pitch.conf
conf/plp.conf
local/buckwalter2utf8.pl
local/qcri_buckwalter2utf8.pl
John
… On Sep 30, 2018, at 2:48 PM, Daniel Povey ***@***.***> wrote:
I'm closing this PR because I believe we merged it indirectly via #2725 <#2725>. If there are other changes you want us to merge, please let us know.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#2722 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABK_gOfxFNhOYOYXKrEObm3ltCrzm-32ks5ugRH5gaJpZM4Wwdhl>.
|
|
@xiaohui-zhang would you mind making a PR with the latest changes, if appropriate? No hurry. |
|
sure!
…On Sun, Sep 30, 2018 at 8:31 PM Daniel Povey ***@***.***> wrote:
@xiaohui-zhang <https://github.com/xiaohui-zhang> would you mind making a
PR with the latest changes, if appropriate? No hurry.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2722 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANiEEa2y0aKL2fkc0ml6gY1MKYNSrvrHks5ugWJHgaJpZM4Wwdhl>
.
--
Xiaohui
|
A recipe to build an ASR system with the Tunisian_MSA corpus of Arabic.