Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested changes, fixes and updates to Hebrew transliteration #67

Open
eyaler opened this issue Jul 26, 2021 · 7 comments
Open

Suggested changes, fixes and updates to Hebrew transliteration #67

eyaler opened this issue Jul 26, 2021 · 7 comments

Comments

@eyaler
Copy link

eyaler commented Jul 26, 2021

I would like to ask for @alonbl feedback/greenlight before preparing my PR. I am interested in addressing several issues I see in the current Hebrew transliteration:

  1. 05ef (triple yod)- can now be transliterated as YYY
  2. seems inconsistent to me to have raffe as - and dagensh to '. if we are going by the graphics then dagesh should be . (dot). but i think a more useful choice would be to ignore both of them (as is currently done for the Shin-dots)
  3. Better alignment with Hebrew Language Academy rules (https://hebrew-academy.org.il/wp-content/uploads/taatik-ivrit-latinit-1-1.pdf):
    a. 05d7 ח is never transliterated as KH - a more standard-compliant version would be H (to differ from h) or h
    b. it is inconsistent to transliterate א as A and ע as back-tic. ע could be A or 'A or A'. but mind you all these choices including for א are non standard. also back-tic for ע is from the "exact standard", but we are otherwise following here the "simple standard" which uses '. I am really not sure what is the right thing to do here. we could also follow other languages and use the letter name in these cases: ALEPH and AYIN.
    c. using @ for schwa is consistent with the IPA symbol but it is not useful and not part of the hebrew standard which ignores schwa in transliteration (or in some cases uses e)
    d. ק should be k as in the simple standard (q is used in the exact standard)
  4. i am not sure what are 05f5, 05f6, 05f7 as they are not part of unicode afaict
  5. fixes in hebrew presentation forms (https://www.unicode.org/charts/PDF/UFB00.pdf)
    a. fb4f should be EL not l
    b. fb4e should be f not p
    c. fb4d should be KH not k
    d. fb4c should be v not b
    e. fb4b should be o not vo, similarly fb1d should by i not yi
    f. fix eg sh, ts to be SH, TS as done in regular letters
    g. fb47 should be k not ts (this is a mistake)
    h. fb41 should be s not n (this is a mistake)
    i. fb3e should be m not l (this is a mistake)
    j. fb30 currently missing should be i
    k. fb27 should be r not m (this is a mistake)
    l. add fb21, fb20 similar to the choices decided on for regular א, ע
  6. graphically sof-pasuk looks like : but for nlp tasks would be more useful to use "." or even ". " as this is the meaning of the punctuation.
@eyaler
Copy link
Author

eyaler commented Jul 29, 2021

we would be happy to do the PR if the @avian2 is interested

@alonbl
Copy link
Contributor

alonbl commented Jul 29, 2021 via email

@avian2
Copy link
Owner

avian2 commented Aug 2, 2021

Hi

@alonbl, if you can review @eyaler 's pull request I would be happy to accept it (since some of the proposed changes touch your changes in 81f938d). I don't know Hebrew and can't comment on the suggested changes.

i am not sure what are 05f5, 05f6, 05f7 as they are not part of unicode afaict

If the codepoints are undefined in Unicode, please set them to None in the transliteration tables.

graphically sof-pasuk looks like : but for nlp tasks would be more useful to use "." or even ". " as this is the meaning of the punctuation.

I trust your judgment in choosing the best compromise here.

Thanks!

@alonbl
Copy link
Contributor

alonbl commented Aug 2, 2021 via email

@eyaler
Copy link
Author

eyaler commented Aug 2, 2021

@alonbl didn't PR yet. hope to get to it soon. will tag.
some points are a matter of view/use case/agenda and there really is no clear right choice. if you are interested alon, we can discuss.
thanks guys!

@eyaler
Copy link
Author

eyaler commented Aug 3, 2021

@alonbl
Copy link
Contributor

alonbl commented Aug 4, 2021

Thanks!

I created a patch with all that I could understand, as you did not provide edit permission we will sync on code, let's narrow it down, see #68.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants