-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kor] Adds phonetic whitelist #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I only have a few style notes then this is ready.
Not knowing a ton about the phonology of the language I defer to you; the comment about that lateral /ʎ/ is also something we should research going forward. In addition to cleaning up the data this also leaves notes to our future selves about fixes to make to the Wiktionary data itself.
If you choose, you can rerun the scrape for (just) Korean as per the instructions in data/wikipron/src
and commit the filtered lists. PR #154 should make that somewhat easier to do going forward. If you don't, I'll just do that as a follow-up.
t͈ | ||
ʝ #allophone of /h/. | ||
s͈ | ||
t̚ #allophone of many alveolar, alveoli-palatal consonants on coda position. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"alveolo-palatal" is the spelling, I think? (I had to look it up: https://en.wikipedia.org/wiki/Alveolo-palatal_consonant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you choose, you can rerun the scrape for (just) Korean as per the instructions in
data/wikipron/src
and commit the filtered lists. PR #154 should make that somewhat easier to do going forward. If you don't, I'll just do that as a follow-up.
I will work on this :)
Looks good, one nit. Can you do the steps here to give us a sense of how many words are filtered by this? If you commit the summary TSV files it'll be apparent. It'll also print out all the words that get filtered, so if you missed anything important... |
After I ran |
Yes, please do. That update will let us see how many words actually got
deleted!
So you know, that particular `README.md` renders as a nice, human-readable
table with the number of entries per language.
…On Mon, May 4, 2020 at 2:40 PM yeonju123 ***@***.***> wrote:
After I ran generate_summary.py, it updated README.md as well. Do I
commit this change as well? I do not see that in the guidelines, but I just
wanted to confirm :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/kylebgorman/wikipron/pull/158#issuecomment-623635752>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABG4OOKSWKPAGUS7ZFYCGLRP4D3FANCNFSM4MYOC7DA>
.
|
Looks good, so you filter about 250 words with this. (I am seeing this by comparing the README before and after your change). Whitelists look good modulo those two comments on the "comment" style, then this is ready to go. |
The style is already fiex, so we can merge, I think :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Unreleased
inCHANGELOG.md
to reflect the changes in code or data.Korean phonetic whitelist is created,
kor_phonetic.whitelist