-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardizing ExtPos (at least for fixed expressions) #1037
Comments
N.B. Currently I have an EWT instance that is triggering a validator error: it is for the adpositional expression "due to" attaching as |
I sincerely do not see much utility in this, as for While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to |
Indeed, the ExtPos information is already implied if the deprel is correct and is a functional relation (
Could you elaborate? |
Exactly, for example "each other" has |
Annotation practices of course interfere with what would be the "expected" POS ( If I am not mistaken, this sequence would be labelled with From the data, there appear to be just these relations in English treebanks for each other. Now, imagine that it were annotated with relation UPDATE: I recognise the following interpretation is faulty, I am sorry for this. I am toning it down but I am leaving it here for the more general points. Now, still more specifically to each other is why it should be annotated as By the way, the English case is quite different from the more or less corresponding Latin one, where we have a reciprocal element invicem: while this has transparent etymology in + vicem 'in [smb's/the other's] turn', it really looks crystallised and it does not appear where you would expect an oblique nominal phrase: you have it used as an So really I cannot see what ExtPos would add.
Here allow me to refer to my article Formae reformandae (UDW5). Traditionally, we have labels like participle, infinitive, supine, masdar, etc. to refer to particular forms in verbal paradigms whereby a verb gets to be used as a different part of speech, as it were. So, the participle is a verbal adjective whereby I can say (examples in Latin):
The form scriptura behaves in all like an adjective: inflection for gender/number/case, possibility of degree (scripturior, scripturissima), possibility of adverbialisation (scripture); but then also as a verb, in that can have the same argument structure: scriptura librum 'going to write a book', with accusative, instead of a nominal strategy like genitive *scriptura libri. So, in the end using |
Discussions of English pronouns are at #517, and docs at https://universaldependencies.org/en/pos/PRON.html. While it might be nice to show the historical origin of the expression with a relation other than
Without ExtPos, how would one search a treebank for all expressions acting as pronouns? The rule would need to specify individual lexical items like "one another". But with ExtPos, it is easy to find the ones that are not PRON at the individual word level. You mentioned There also may be languages with fixed expressions functioning as PART, for example. PART is idiosyncratic and not necessarily predictable from the deprel.
The line between VERB and ADJ can be tricky and I don't know enough about Latin to weigh in here (VerbForm=Part as used in English is NOT equivalent to occurring in ADJ-like environments), but yes, there may be many good uses of ExtPos beyond fixed expressions. |
UPDATE: I know that in the haste of writing I put forth a faulty interpretation of English each other , I am sorry (but I am leaving it there). This however does not invalidate the other points. Anyway, this is yet another case where, if each other is indeed a unique word like Latin invicem, written separately just for the vagaries of orthography, I think a token with spaces could be welcome. |
One would look for all elements with nominal relations ( A similar thing has already to be performed to retrieve predicates: a word receiving I am somewhat worried that a feature like
This is a general problem which goes beyond the appropriateness of annotating In this specific case, the issue has to be solved by addressing how to mark the presence of an ellipsis and/or the nature of vice versa: the annotation as
We would need some example to discuss this. Anyhow,
It really is the same in any Indo-European language (and beyond). What are non-ADJ-like environments of English But the point is, transposition exists and a unified way to mark it could be useful. |
Some treebanks are already using I suspect requiring It seems to me that most of your objections above are actually objections to the |
I don't even understand why there is a discussion about the relevancy of ExtPos. ExtPos is just as relevant as upos, not more, not less. @Stormur if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos. (I don't think it is true that the POS can be inferred from the syntactic relation but that's not the point.) And even if it could be inferred, what is the problem to add ExtPos? I really don't understand the point. One of the reasons we introduced ExtPos (apart the fact that in SUD our syntactic relations are less redundant with upos) is that it was difficult to track down the annotation errors or to find strange constructions because we add many unexpected pairs upos-relations. It is possible with Grew-match to search elements that have ExtPos=ADV or if no ExtPos, upos=ADV and then to get all the ADVs of one or several tokens (if you ExtPos on all fixed expressions as in French treebanks). |
No problems in using it if one sees fit to do that, but only with making it more or less mandatory with warnings from the validator. I am contrary to that. Then, my personal considerations about its utility still stand.
I think it is slightly different in that I do not envision While in general it is true we are interested to see whether, say, an
This is for sure a very big problem. |
I think it does not have to be that way. If I have to add |
I understand, but this is independent from |
Yes, there is definitely extra work required. But if the validator is modified to take |
But this would be an extra test created from nothing, from the addition of this feature which itself can only be added on contextual grounds as by definition it cannot depend on the characteristics of the single components. Because if it would, then why There is circularity here. I also fear that making Now I will sit silent because I think I have already insisted too much on these points (sorry) and I am becoming grumpy and repetitive. But do not get me wrong, I can understand the implementation of tests like the ones you describe. However, all in all, I believe that these possible benefits are extremely marginal at best and that drawbacks on the contrary are too many. I would like to see a different "angle of attack" to the issues that we are confronting here. |
Today the Core Group discussed FEATS vs. MISC and voted that FEATS would be a better home for ExtPos. Most MISC attributes are optional and unregulated at the universal level; putting ExtPos in FEATS gives it greater visibility and is in keeping with existing practice by the SUD group. Another practical advantage is a clear home in the docs for universal + language-specific pages (e.g. https://universaldependencies.org/en/feat/ExtPos.html). The encouragement to document the different values of ExtPos with examples in each language may have the effect of promoting discussion of the appropriate scope of |
@dan-zeman has drafted a universal guidelines page: https://universaldependencies.org/u/feat/ExtPos.html A couple of questions about French examples:
|
I took the French examples from the French documentation but I did not verify them in the French treebanks. |
I switched the "plutôt que" example to a "bien que" example from one of the treebanks. @sylvainkahane or @bguil, maybe you could confirm the "de la" example of |
here are all the values of ExtPos in the French GSD treebank: https://universal.grew.fr/?custom=66841dc7423dd. |
The de la example occurs 9 times in Sequoia. |
Ah I was querying for "la" as the lemma when it should be "le". OK I guess this is the partitive article construction. (Curious: Can "de la" ever be used on a subject? I mainly see it following a verb or preposition, where historically "de" might have acted as a preposition.) |
Two anwers to @nschneid.
|
In terms of implementing this in English treebanks such as PUD, are we at the point of labeling |
Sounds good, will implement for GUM as well |
This ExtPos policy applies to all |
Actually I just mean - are we now ready to label |
We're ready to implement! The validator is not updated yet (once it is there will be an official announcement of the new policy), but I've already implemented in EWT. |
GUM is implemented too, just moved it to FEATS, should update the next push |
Found some cases of Train section:
Test:
|
Where is the line to draw for
but then many others are not marked, such as
|
There are several not marked in EWT:
Marked in PUD but not existing in EWT:
|
What about
incidentally, am happy that as a human, we have surgical options other than "shotgun" for deal with atrophied cheek muscles |
I don't see much difference with those and the following:
although certainly there might be some subtle differences dev, not marked:
test, not marked:
|
@AngledLuffa these are great questions/observations about |
Done |
|
The first post says
I don't see |
Good point about MWEPOS in MISC. In fact, |
In the French treebanks, ExtPos has been used for SYMs used as NOUNs (%, €, etc), CCONJs (&), etc.; for foreign words or combinations of letters and numbers (upos=X) used as PROPNs; for ADVs used as PRONs (a special construction of French, ADV de NOUN, where the ADV is the syntactic head and cans also be used alone as a pronoun); and some other rarer examples. |
The Core Group has agreed to encourage the use of
ExtPos
to specify how a word (or the expression/phrase it heads) functions with respect to its external deprel, where this may differ from the UPOS reflecting the word's morphology and dependents.In particular, the first word of every
fixed
expression should specifyExtPos
to reflect the UPOS that the whole expression would bear were it a single word.Some treebanks have already been using
ExtPos
in MISC, others in FEATS. @dan-zeman has said that MISC is the appropriate place as it reflects properties broader than a single word's morphology.I assume we need to the following:
fixed
, though treebanks may opt to use it for art titles, SYM, and so on). Where will this live? The MISC attributes page is a bit cluttered with treebank-specific/experimental attributes. Cf. https://universaldependencies.org/en/feat/ExtPos.htmlfixed
guidelines.The text was updated successfully, but these errors were encountered: