-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduplicative plural in Indonesian #238
Comments
How about using
The docs say "This relation links two parts of a word that are separated in text that is not well edited. ", you could consider this just suboptimal tokenisation. |
Still lots of details to work out in UD! I would like to suggest using p.s. In Indonesian, it's somewhat debated as to whether it is really right to view reduplication as a straightforward plural marker, though, actually, that seems to be increasingly true in modern Indonesian. See, e.g., http://sealang.net/sala/archives/pdf8/rafferty2002reduplication.pdf . |
See also #307, where the decision was also |
Indonesian forms plural of nouns by repeating the noun twice. Example: oleh negara-negara nordic “by nordic countries”; negara = “country”. This is called reduplication and it is not limited to Indonesian.
It could be analyzed at the level of morphological features (
Number=Plur
) but tokenization would have to keep negara-negara as one token; this is not what we have in our Indonesian treebank. In contrast, there are three tokens, negara - negara.Since the reduplicated part is treated as a separate word (and provided we want to keep it that way), we need a language-specific relation to attach the reduplicated part to the first part. The treebank currently uses
mwe
but it is wrong becausemwe
is to be used only with function expressions, such as multi-word prepositions. (Side note: I discovered this by checking that no node hasmwe
andname
children at the same time. Most treebanks comply with this rule, except for 1 occurrence in French, 9 occurrences in Italian and 72 occurrences in Indonesian. See http://hdl.handle.net/11346/PMLTQ-SUPM).What about
nmod:plur
(an analogy tonmod:poss
for 's in English)?The text was updated successfully, but these errors were encountered: