-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexical refinement on edeprels #287
Comments
@dan-zeman When I try to allow |
For "in order {to, for...to, that}", I see that |
I am wondering whether this issue should reside under docs, as it is not just about EWT. |
I actually answered there (#182 (comment)) and now I see that you moved the question here. So here is a copy:
The last set of functions are intended for use with conj: Paratactic relation Conjunction (“and”) Maybe we can say it's a disjunction? I thought it was a subordinator and |
I see that @nschneid has added where with the example "I know [where you live]" but I would argue that at least in this sentence, where is an adverb and it should be attached via IMHO the same problem is wherever (2 occurrences in EWT, one as ADV, one as SCONJ, without any actual difference; 2 occurrence in GUM, both in wherever possible, both treated as SCONJ; I believe all of them should be ADV). IMHO the same problem is whither (1 occurrence in GUM, should be ADV). |
In the pattern no choice but to do something, there is an |
@amir-zeldes : In GUM, M = 7.64 ± 1.12 is analyzed so that ± 1.12 is |
Is it grammatical in English to omit the second as from as well as? GUM has one example: ... this project study gives solution to the problem of the society concerning environment, health and safety as well energy conservation ... To me it sounds like there should be as well as but the author forgot to complete it. If that's true, then the enhanced relation should be |
Is it a good idea to augment enhanced deprels with foreign case markers in code-switched data? Example: GUM uses Alternatively, the validator could be modified to also observe MISC |
Any criteria for deciding whether versus should be preposition or coordinator in English? All examples in EWT and GUM result in |
Is til an acceptable alternative spelling of till, or is it a typo that should be normalized to till? There is one |
There are two instances of aka in EWT (http://hdl.handle.net/11346/PMLTQ-FRJ0). They are tagged as GUM has six instances (http://hdl.handle.net/11346/PMLTQ-FAAK). They are tagged as |
Thanks for finding all of these! My take on these is:
|
|
@nschneid has added it.
I will leave this one for the two of you to sort out (
I never heard of phasing it out in UD but maybe it was some English-internal discussion. FWIW, the equivalents in Czech are treated as adverbs (and it is the same in the Prague treebanks, i.e., without any connection to Stanford Dependencies). I am convinced that a wh-adverb stays an adverb and occupies an adverbial position regardless whether it is a question, a complement clause, or an adverbial clause.
OK,
I would definitely not remove the sentence because that would break the integrity of the document, but you probably did not mean that. I would also not necessarily flatten the tree; I think using UPOS In contrast, I did register Latin et as an English conjunction because I thought et al. has been naturalized in English.
Yeah, also known as could be a fixed multi-word preposition or conjunction (but would you treat it as such if it occurred in the corpus?) I don't think it disqualifies the nominal from being an apposition (semantically it indeed sounds like one). Actually, I had the same feeling about such as, that's why I did not include it in the first round of porting English edeprels. So maybe the two should have the same solution. But if you guys believe it has to be |
This is subject to debate. @amir-zeldes thinks of et as a conjunction even in English, whereas I think of "et al." as a fixed phrase. Let's revisit after resolving "etc.". |
What about |
That was actually the case of the French sentence I showed (the English translation came two sentences later). |
Lexical refinement is not present everywhere, so I don't think this is necessary. (And deprels are all-lowercase, so it would have to be |
There should almost always be a lexical refinement if the dependent has a |
But if the scripts already check the presence of a |
I feel quite strongly against adding any more complexity to the internal logic of the deprels. In fact, I hope that in the distant future, we will be able to replace all these lexical labels with some semantic tags that will be portable across languages. I'm actually quite fine with |
I would opt for simplicity as well - edeps are a work in progress from my perspective, and messing with them too much right now may be premature optimization. I am happy with "from_above" for right now, and if infinitive "to" is in then it is in, meaning "but_to" (in the sense "except to") is also in. I also don't think ":foreign" is necessary since there is |
The above changes (and recent additions to the validator list) result in EWT being VALID! A couple of items to note for future investigation:
The other changes were fairly straightforward. |
Just one thing about comparative correlatives ("the more the merrier") - I'm all for advcl:the here, but that does open the question of what deprel and POS it should have. Currently it has Options include:
Maybe option 2. is the best for preserving 'status quo' while expressing linguistic structure faithfully, though it does create an xpos/upos disparity (but an automatable one, since no other 'the' is deprel mark). |
Option 2 sort of makes sense to me. The only other viable option seems to be 5 (perhaps I'd even prefer that one). Because if we put |
Yes, I think it is - historically it's a separate case form, distinct from the regular article (also compare the German form "desto", which is not the same as the regular article). It's only coincidentally a homonym of the article at this point, but really it's a totally different word morphosyntactically - it's labeled |
The inventory of possible lexical markers on
nmod
,obl
,acl
,advcl
, andconj
Enhanced relations is now specified for the validator at: https://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_edeprel.pl?lcode=enThere are a number of errors in EWT and GUM, some of which require tweaking of the inventory, and others of which should be changed in the data. Let's use this issue to track the discussion.
The text was updated successfully, but these errors were encountered: