Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kor] Adds phonetic whitelist #158

Merged
merged 4 commits into from
May 5, 2020
Merged

[kor] Adds phonetic whitelist #158

merged 4 commits into from
May 5, 2020

Conversation

yeonju123
Copy link
Collaborator

  • Updated Unreleased in CHANGELOG.md to reflect the changes in code or data.

Korean phonetic whitelist is created,kor_phonetic.whitelist

Copy link
Collaborator

@kylebgorman kylebgorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I only have a few style notes then this is ready.

Not knowing a ton about the phonology of the language I defer to you; the comment about that lateral /ʎ/ is also something we should research going forward. In addition to cleaning up the data this also leaves notes to our future selves about fixes to make to the Wiktionary data itself.

If you choose, you can rerun the scrape for (just) Korean as per the instructions in data/wikipron/src and commit the filtered lists. PR #154 should make that somewhat easier to do going forward. If you don't, I'll just do that as a follow-up.

ʝ #allophone of /h/.
t̚ #allophone of many alveolar, alveoli-palatal consonants on coda position.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"alveolo-palatal" is the spelling, I think? (I had to look it up: https://en.wikipedia.org/wiki/Alveolo-palatal_consonant).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you choose, you can rerun the scrape for (just) Korean as per the instructions in data/wikipron/src and commit the filtered lists. PR #154 should make that somewhat easier to do going forward. If you don't, I'll just do that as a follow-up.

I will work on this :)

@kylebgorman
Copy link
Collaborator

Looks good, one nit.

Can you do the steps here to give us a sense of how many words are filtered by this? If you commit the summary TSV files it'll be apparent. It'll also print out all the words that get filtered, so if you missed anything important...

https://github.com/kylebgorman/wikipron/blob/readme/data/wikipron/whitelist/README.md

@yeonju123
Copy link
Collaborator Author

After I ran generate_summary.py, it updated README.md as well. Do I commit this change as well? I do not see that in the guidelines, but I just wanted to confirm :)

@kylebgorman
Copy link
Collaborator

kylebgorman commented May 4, 2020 via email

@kylebgorman kylebgorman changed the title Adds Korean phonetic whitelist [kor] Adds phonetic whitelist May 4, 2020
@kylebgorman
Copy link
Collaborator

Looks good, so you filter about 250 words with this. (I am seeing this by comparing the README before and after your change).

Whitelists look good modulo those two comments on the "comment" style, then this is ready to go.

@yeonju123
Copy link
Collaborator Author

Looks good, so you filter about 250 words with this. (I am seeing this by comparing the README before and after your change).

Whitelists look good modulo those two comments on the "comment" style, then this is ready to go.

The style is already fiex, so we can merge, I think :)

Copy link
Collaborator

@kylebgorman kylebgorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yeonju123 yeonju123 merged commit 09a43ae into CUNY-CL:master May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants