Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardizing ExtPos (at least for fixed expressions) #1037

Closed
nschneid opened this issue Jun 3, 2024 · 40 comments
Closed

Standardizing ExtPos (at least for fixed expressions) #1037

nschneid opened this issue Jun 3, 2024 · 40 comments
Labels
enhancement features universal UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@nschneid
Copy link
Contributor

nschneid commented Jun 3, 2024

The Core Group has agreed to encourage the use of ExtPos to specify how a word (or the expression/phrase it heads) functions with respect to its external deprel, where this may differ from the UPOS reflecting the word's morphology and dependents.

In particular, the first word of every fixed expression should specify ExtPos to reflect the UPOS that the whole expression would bear were it a single word.

Some treebanks have already been using ExtPos in MISC, others in FEATS. @dan-zeman has said that MISC is the appropriate place as it reflects properties broader than a single word's morphology.

I assume we need to the following:

  1. A description of how ExtPos should be used (at least for fixed, though treebanks may opt to use it for art titles, SYM, and so on). Where will this live? The MISC attributes page is a bit cluttered with treebank-specific/experimental attributes. Cf. https://universaldependencies.org/en/feat/ExtPos.html
  2. This should be mentioned in the fixed guidelines.
  3. An update to the validator that checks ExtPos if present and uses that rather than UPOS for checks on the incoming deprel.
  4. An update to the validator that warns if ExtPos is missing for a fixed expression. (not an error)
  5. An announcement blurb for the changes page.
@nschneid
Copy link
Contributor Author

nschneid commented Jun 3, 2024

N.B. Currently I have an EWT instance that is triggering a validator error: it is for the adpositional expression "due to" attaching as case, only the "to" is omitted so there is no fixed dependency. It would make sense to tag "due" as ADJ and ExtPos=ADP, but the validator needs to be updated to recognize the latter because it is not allowing an ADJ to attach as case.

@Stormur
Copy link
Contributor

Stormur commented Jun 3, 2024

I sincerely do not see much utility in this, as for fixed and most other cases this is already determined by the deprel: that is, ExtPos = expected POS of the deprel.

While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to VerbForm.

@nschneid
Copy link
Contributor Author

nschneid commented Jun 3, 2024

for fixed and most other cases this is already determined by the deprel: that is, ExtPos = expected POS of the deprel.

Indeed, the ExtPos information is already implied if the deprel is correct and is a functional relation (cc, case, mark, or advmod). But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways), and in general, making ExtPos explicit highlights on the same line as the first word the fact that its UPOS does not control its deprel.

While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to VerbForm.

Could you elaborate?

@amir-zeldes
Copy link
Contributor

But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways)

Exactly, for example "each other" has ExtPos=PRON but a variety of deprels.

@Stormur
Copy link
Contributor

Stormur commented Jun 4, 2024

for fixed and most other cases this is already determined by the deprel: that is, ExtPos = expected POS of the deprel.

Indeed, the ExtPos information is already implied if the deprel is correct and is a functional relation (cc, case, mark, or advmod). But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways), and in general, making ExtPos explicit highlights on the same line as the first word the fact that its UPOS does not control its deprel.

But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways)

Exactly, for example "each other" has ExtPos=PRON but a variety of deprels.

Annotation practices of course interfere with what would be the "expected" POS (ExpPos 😬 ) for a dependency relation. But let's take each other as a specific example.

If I am not mistaken, this sequence would be labelled with ExtPos=PRON because it is considered a MWE behaving as a whole as a reciprocal pronoun. This means that we expect it to get relations obj, nsubj, obl, iobj: all of these entail a nominal part of speech, so either NOUN (+ PROPN) or PRON. The fact that this MWE is ascribable to PRON rather than NOUN derives from the fact that its "head" (and actually both elements) are of a synsemantic nature. But anyway, this would be an internal distinction to the fact of "behaving nominally". There are also other possible relations like conj, orphan, parataxis... which are neutral with respect to parts of speech, so they are not relevant here.

From the data, there appear to be just these relations in English treebanks for each other. Now, imagine that it were annotated with relation advmod. I am quite confident that in this case ExtPos would be set to ADV; if not, the correctness of advmod would be very doubtful (and in fact I think it would not be correct). This goes to show that ExtPos is a case of contextual annotation, as it is mechanically determined by the dependency relation: it is redundant and not useful. (Incidentally, I am very much against enforcing warnings from the annotator if this feature is to be annotated under MISC.)


UPDATE: I recognise the following interpretation is faulty, I am sorry for this. I am toning it down but I am leaving it here for the more general points.

Now, still more specifically to each other is why it should be annotated as fixed. It seems transparent: you have a contrastive element other modified by a distributive each, and this is a determinantal (or it might be argued, pronominal) phrase which behaves as any other nominal argument. I see that in some treebanks the "head" each gets the feature PronType=Rcp, which is problematic: if annotated at all, this should also go into MISC, exactly as it has been proposed for ExtPos. I think however that here we need to refer to a MWE annotation level and not let it percolate onto the morphosyntactic one.

By the way, the English case is quite different from the more or less corresponding Latin one, where we have a reciprocal element invicem: while this has transparent etymology in + vicem 'in [smb's/the other's] turn', it really looks crystallised and it does not appear where you would expect an oblique nominal phrase: you have it used as an obj, or you have things like ab invicem 'from each+other', ad invicem 'to each+other', etc. (i.e., here you would have two adpositions). No reason to split it to have it again annotated as fixed: this might appear on a derivational annotation layer, but it does not seem appropriate to the morphosyntactic one anymore.


So really I cannot see what ExtPos would add.


While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to VerbForm.

Could you elaborate?

Here allow me to refer to my article Formae reformandae (UDW5). Traditionally, we have labels like participle, infinitive, supine, masdar, etc. to refer to particular forms in verbal paradigms whereby a verb gets to be used as a different part of speech, as it were. So, the participle is a verbal adjective whereby I can say (examples in Latin):

  • scriptura poetissa 'going-to-write poetess', which might be also expressed as
  • poetissa quae scriptura est/erat/erit 'poetess who will be/was/is going to write', or similar, with a "finite" verb form (i.e., a "verby verb")

The form scriptura behaves in all like an adjective: inflection for gender/number/case, possibility of degree (scripturior, scripturissima), possibility of adverbialisation (scripture); but then also as a verb, in that can have the same argument structure: scriptura librum 'going to write a book', with accusative, instead of a nominal strategy like genitive *scriptura libri.

So, in the end using VerbForm=Part would be equivalent to ExtPos=ADJ (in fact, I have proposed a notation like Transposed=ADJ), but in this case this is tied effectively to morphology, and not to an invisible "global property" of a MWE.

@nschneid
Copy link
Contributor Author

nschneid commented Jun 4, 2024

Now, still more specifically to each other is why it [each other] should be annotated as fixed.

Discussions of English pronouns are at #517, and docs at https://universaldependencies.org/en/pos/PRON.html. While it might be nice to show the historical origin of the expression with a relation other than fixed, it seemed our best option to express the reciprocal slot of the pronoun paradigm was to use fixed and treat the whole thing as PRON.

So really I cannot see what ExtPos would add.

Without ExtPos, how would one search a treebank for all expressions acting as pronouns? The rule would need to specify individual lexical items like "one another". But with ExtPos, it is easy to find the ones that are not PRON at the individual word level.

You mentioned conj etc.: these are cases where it is not always trivial to detect the UPOS from the deprel. From English-GUM: "husbands are likely to laugh at jokes about wives and vice versa"—ExtPos is necessary to express that "vice versa" functions as an ADV (coordinated with an ADJ).

There also may be languages with fixed expressions functioning as PART, for example. PART is idiosyncratic and not necessarily predictable from the deprel.

So, in the end using VerbForm=Part would be equivalent to ExtPos=ADJ (in fact, I have proposed a notation like Transposed=ADJ), but in this case this is tied effectively to morphology, and not to an invisible "global property" of a MWE.

The line between VERB and ADJ can be tricky and I don't know enough about Latin to weigh in here (VerbForm=Part as used in English is NOT equivalent to occurring in ADJ-like environments), but yes, there may be many good uses of ExtPos beyond fixed expressions.

@Stormur
Copy link
Contributor

Stormur commented Jun 4, 2024

UPDATE: I know that in the haste of writing I put forth a faulty interpretation of English each other , I am sorry (but I am leaving it there). This however does not invalidate the other points.

Anyway, this is yet another case where, if each other is indeed a unique word like Latin invicem, written separately just for the vagaries of orthography, I think a token with spaces could be welcome.

@Stormur
Copy link
Contributor

Stormur commented Jun 4, 2024

Without ExtPos, how would one search a treebank for all expressions acting as pronouns? The rule would need to specify individual lexical items like "one another". But with ExtPos, it is easy to find the ones that are not PRON at the individual word level.

One would look for all elements with nominal relations (nsubj, obj, nmod, ...) and select those whose head falls into a synsemantic word class. If the head is not synsemantic, I would put in doubt the pronominality of the expression. Conversely, I don't think that we want to assign ExtPos=ADV to phrases like gr. pro Kopf ~ 'each', lit. 'per head', or to any other oblique.

A similar thing has already to be performed to retrieve predicates: a word receiving advcl, csubj, etc. can well be a non-verb with an auxiliary. But I do not think that we want to assign ExtPos=VERB to those occurrences. The relation already tells us that. On the other hand, it is interesting to know if a csubj is headed by a verb form "mimicking" a NOUN or an ADJ.

I am somewhat worried that a feature like ExtPos could go out of hands and be very much misinterpreted by new annotators, as it already happens for fixed.


You mentioned conj etc.: these are cases where it is not always trivial to detect the UPOS from the deprel. From English-GUM: "husbands are likely to laugh at jokes about wives and vice versa"—ExtPos is necessary to express that "vice versa" functions as an ADV (coordinated with an ADJ).

This is a general problem which goes beyond the appropriateness of annotating ExtPos.

In this specific case, the issue has to be solved by addressing how to mark the presence of an ellipsis and/or the nature of vice versa: the annotation as ADV is a confusing factor here (in the sense that it does not look like the right solution, at least not to me). Annotating ExtPos here does not add anything, if possible it makes it even more confusing (I would immediately go look into the data to understand what justifies this asymmetry).

There also may be languages with fixed expressions functioning as PART, for example. PART is idiosyncratic and not necessarily predictable from the deprel.

We would need some example to discuss this. Anyhow, PART is rather restricted in what it can be associated to. Another point is that it is this idiosincraticity of PART annotation the problem we have to address.

So, in the end using VerbForm=Part would be equivalent to ExtPos=ADJ (in fact, I have proposed a notation like Transposed=ADJ), but in this case this is tied effectively to morphology, and not to an invisible "global property" of a MWE.

The line between VERB and ADJ can be tricky and I don't know enough about Latin to weigh in here (VerbForm=Part as used in English is NOT equivalent to occurring in ADJ-like environments), but yes, there may be many good uses of ExtPos beyond fixed expressions.

It really is the same in any Indo-European language (and beyond). What are non-ADJ-like environments of English VerbForm=Part (which should at the same time be non-VERB-like)? If it were so, could I dare to suggest that this annotation might need some revision from a typological point of view?

But the point is, transposition exists and a unified way to mark it could be useful.

@nschneid
Copy link
Contributor Author

nschneid commented Jun 4, 2024

I am somewhat worried that a feature like ExtPos could go out of hands and be very much misinterpreted by new annotators, as it already happens for fixed.

Some treebanks are already using ExtPos. Treebanks are free to innovate with MISC attributes. As far as the validator is concerned, the only change will be for fixed expressions (and it will be a warning not an error). If there is enthusiasm for a broader definition of ExtPos down the road, that might lead to new guidelines, but I think that would be premature at this point.

I suspect requiring ExtPos on fixed expressions might actually encourage treebanks to reduce their use of fixed, because they will realize that most semantic multiword expressions can be accommodated by syntactically regular deprels (but we'll see).

It seems to me that most of your objections above are actually objections to the fixed analysis in the first place. I don't want to bog down this thread with debates about particular expressions, but given that the relation exists to capture grammatical words-with-spaces, it doesn't seem like there is much harm in assigning those a holistic tag (even if it is sometimes inferable from the deprel, just as ADP, ADV, CCONJ, SCONJ are usually inferable from the deprel for single words). Explicitly flagging, e.g. for "rather" in "rather than", that it is an ADV internally and part of a CCONJ expression externally (rather than some other anomaly leading to ADV/cc) seems like it would help treebank users see what is going on.

@sylvainkahane
Copy link
Contributor

I don't even understand why there is a discussion about the relevancy of ExtPos. ExtPos is just as relevant as upos, not more, not less. @Stormur if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos. (I don't think it is true that the POS can be inferred from the syntactic relation but that's not the point.) And even if it could be inferred, what is the problem to add ExtPos? I really don't understand the point.

One of the reasons we introduced ExtPos (apart the fact that in SUD our syntactic relations are less redundant with upos) is that it was difficult to track down the annotation errors or to find strange constructions because we add many unexpected pairs upos-relations. It is possible with Grew-match to search elements that have ExtPos=ADV or if no ExtPos, upos=ADV and then to get all the ADVs of one or several tokens (if you ExtPos on all fixed expressions as in French treebanks).

@Stormur
Copy link
Contributor

Stormur commented Jun 5, 2024

No problems in using it if one sees fit to do that, but only with making it more or less mandatory with warnings from the validator. I am contrary to that.


Then, my personal considerations about its utility still stand.

if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos

I think it is slightly different in that I do not envision ExtPos for fixed being other than contextual, more or less by definition given its "externality".

While in general it is true we are interested to see whether, say, an nmod is realised by a NOUN,/PROPN, PRON, ADJ, DET, NUM, VERB with a VerbForm... but in those cases, we have a syntactic word which does show characteristics of that word class.

It seems to me that most of your objections above are actually objections to the fixed analysis in the first place.

This is for sure a very big problem.

@dan-zeman dan-zeman added enhancement UPOS Universal part-of-speech tags: definitions and examples features universal labels Jun 5, 2024
@dan-zeman dan-zeman added this to the v2.15 milestone Jun 5, 2024
@dan-zeman
Copy link
Member

if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos

I think it is slightly different in that I do not envision ExtPos for fixed being other than contextual, more or less by definition given its "externality".

I think it does not have to be that way. If I have to add ExtPos to all fixed expressions in a treebank using a script, the script will not look at the context and make inferences like "the incoming deprel is advmod, hence ExtPos=ADV". Instead, the script will have a list of the fixed expressions in the language and a "dictionary" UPOS for each of them. I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

@Stormur
Copy link
Contributor

Stormur commented Jun 5, 2024

I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

I understand, but this is independent from ExtPos and based just on a query for fixed...

@dan-zeman
Copy link
Member

I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

I understand, but this is independent from ExtPos and based just on a query for fixed...

Yes, there is definitely extra work required. But if the validator is modified to take ExtPos into account, some of its current tests can be applied. The current state is that if the validator sees a fixed child, it will turn off many of its UPOS-DEPREL compatibility tests.

@Stormur
Copy link
Contributor

Stormur commented Jun 6, 2024

I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

I understand, but this is independent from ExtPos and based just on a query for fixed...

Yes, there is definitely extra work required. But if the validator is modified to take ExtPos into account, some of its current tests can be applied. The current state is that if the validator sees a fixed child, it will turn off many of its UPOS-DEPREL compatibility tests.

But this would be an extra test created from nothing, from the addition of this feature which itself can only be added on contextual grounds as by definition it cannot depend on the characteristics of the single components. Because if it would, then why fixed? And so it all boils down again to just checking all fixed combinations, whatever their dependency relations.

There is circularity here. I also fear that making ExtPos de facto mandatory would lead to an increase of fixed expressions in new annotation endeavours, as in a sense this would justify the use of fixed more than it is warranted (while we actually need the opposite, I think).

Now I will sit silent because I think I have already insisted too much on these points (sorry) and I am becoming grumpy and repetitive. But do not get me wrong, I can understand the implementation of tests like the ones you describe. However, all in all, I believe that these possible benefits are extremely marginal at best and that drawbacks on the contrary are too many. I would like to see a different "angle of attack" to the issues that we are confronting here.

@nschneid
Copy link
Contributor Author

nschneid commented Jul 2, 2024

Today the Core Group discussed FEATS vs. MISC and voted that FEATS would be a better home for ExtPos. Most MISC attributes are optional and unregulated at the universal level; putting ExtPos in FEATS gives it greater visibility and is in keeping with existing practice by the SUD group. Another practical advantage is a clear home in the docs for universal + language-specific pages (e.g. https://universaldependencies.org/en/feat/ExtPos.html). The encouragement to document the different values of ExtPos with examples in each language may have the effect of promoting discussion of the appropriate scope of fixed.

@nschneid
Copy link
Contributor Author

nschneid commented Jul 2, 2024

@dan-zeman has drafted a universal guidelines page: https://universaldependencies.org/u/feat/ExtPos.html

A couple of questions about French examples:

image image

@dan-zeman
Copy link
Member

I took the French examples from the French documentation but I did not verify them in the French treebanks.

@nschneid
Copy link
Contributor Author

nschneid commented Jul 2, 2024

I switched the "plutôt que" example to a "bien que" example from one of the treebanks.

@sylvainkahane or @bguil, maybe you could confirm the "de la" example of ExtPos=DET? Why would that not just be an ordinary ADP + DET combination?

@sylvainkahane
Copy link
Contributor

here are all the values of ExtPos in the French GSD treebank: https://universal.grew.fr/?custom=66841dc7423dd.
If you look at the DET value you find "de la" (and its variant "de l'"). Note that "de la" is not always an indefinite determiner, it can also be the combination of ADP "de" and the definite determiner "la".

@dan-zeman
Copy link
Member

The de la example occurs 9 times in Sequoia.

@nschneid
Copy link
Contributor Author

nschneid commented Jul 2, 2024

Ah I was querying for "la" as the lemma when it should be "le". OK I guess this is the partitive article construction. (Curious: Can "de la" ever be used on a subject? I mainly see it following a verb or preposition, where historically "de" might have acted as a preposition.)

@sylvainkahane
Copy link
Contributor

Two anwers to @nschneid.

  1. Yes "de la" is the partitive article. I don't like this notion, in fact it is just the indefinite article for massive nouns. Note that the plural indefinite article "des" is also a portmanteau "de+les".

  2. Using an indefinite article in the subject position is not very felicitous in French: https://universal.grew.fr/?custom=668424e695391. When the subject is indefinite, we have a special construction. Rather than saying S V, we prefer "il y a S qui V" 'there is S that V", especially in spoken French: https://universal.grew.fr/?custom=668429b9e5348.

nschneid added a commit to UniversalDependencies/UD_English-EWT that referenced this issue Jul 5, 2024
nschneid added a commit to UniversalDependencies/UD_English-EWT that referenced this issue Jul 5, 2024
@AngledLuffa
Copy link

In terms of implementing this in English treebanks such as PUD, are we at the point of labeling sort of etc, or not there yet?

@amir-zeldes
Copy link
Contributor

FEATS would be a better home for ExtPos

Sounds good, will implement for GUM as well

@nschneid
Copy link
Contributor Author

nschneid commented Jul 9, 2024

In terms of implementing this in English treebanks such as PUD, are we at the point of labeling sort of etc, or not there yet?

This ExtPos policy applies to all fixed expressions, if that's what you're asking. If there are questions about what counts as fixed that should go in other issues.

@AngledLuffa
Copy link

Actually I just mean - are we now ready to label fixed expressions in PUD, or is there a reason to wait for the standard to be finalized and/or the validator to be updated?

@nschneid
Copy link
Contributor Author

nschneid commented Jul 9, 2024

We're ready to implement! The validator is not updated yet (once it is there will be an official announcement of the new policy), but I've already implemented in EWT.

@amir-zeldes
Copy link
Contributor

GUM is implemented too, just moved it to FEATS, should update the next push

@AngledLuffa
Copy link

Found some cases of up to which may need a fixed relation in EWT

Train section:

bundling together cheques of up to $1,000 from friends and family

but not up to the standards that I was told I should expect

the food was not up to par with the price tag

Test:

# text = I'll pay up to 200-250 for it if I have to.

@AngledLuffa
Copy link

Where is the line to draw for as X as expressions? There are some marked in EWT, such as

**as well as** the fun filled social dance evening held every Saturday evening
I will often have **as many as** one per kitten

but then many others are not marked, such as

We should know **as much as** we can

@AngledLuffa
Copy link

There are several fixed expressions marked in PUD which are not marked in EWT. Here are few:

not marked in EWT:

after all
After all, the internet is not a luxury

as if
photographs that looked **as if** they were from the 1970s

at best
**At best** it is naive and at worst it would yet again...

close to  ... similar to "approximately"
Cairo had a population of **close to** half a million

in addition   ... "furthermore"
**In addition**, statute determines the election of assembly of regions

Marked in PUD but not existing in EWT:

more or less:
The working time undertaken in this first hour is more or less equal to 45 minutes.

@AngledLuffa
Copy link

What about down to in a phrase such as

# text = The horse I had posted about a couple weeks ago with the atrophied cheek muscles is down to his last resort for life.

incidentally, am happy that as a human, we have surgical options other than "shotgun" for deal with atrophied cheek muscles

@AngledLuffa
Copy link

AngledLuffa commented Jul 11, 2024

next to in EWT which possibly matches other next to ExtPos

If sites next to you don't have what you want
the sea next to you
I throw a treat across the floor or even right next to her paw
place it next to the couch
the fish look better next to them
right next to the ice machine

I don't see much difference with those and the following:

First room had used tissues next to the bed
It is next to Gare du Nord

although certainly there might be some subtle differences

dev, not marked:

# text = We are staying next to the airport which is located next to BARTrail.

test, not marked:

# text = Place is next to carval and walmart.

@nschneid
Copy link
Contributor Author

@AngledLuffa these are great questions/observations about fixed consistency. Could you please move them to separate issues as I'm sure some will require discussion?

@AngledLuffa
Copy link

Done

@dan-zeman
Copy link
Member

ExtPos should be now also observed by the validator, so I think this issue can be closed.

@martinpopel
Copy link
Member

The first post says

A description of how ExtPos should be used (at least for fixed, though treebanks may opt to use it for art titles, SYM, and so on). Where will this live? The MISC attributes page is a bit cluttered with treebank-specific/experimental attributes. Cf. https://universaldependencies.org/en/feat/ExtPos.html

I don't see ExtPos at https://universaldependencies.org/misc (or anywhere else), so I suggest to keep this issue open until ExtPos is properly documented.
Interestingly, MWEPOS is documented there and it is the only place where ExtPos is mentioned, but it says "Ideally, these two attribute names should be merged into one!"

@dan-zeman
Copy link
Member

I don't see ExtPos at https://universaldependencies.org/misc (or anywhere else), so I suggest to keep this issue open until ExtPos is properly documented.
Interestingly, MWEPOS is documented there and it is the only place where ExtPos is mentioned, but it says "Ideally, these two attribute names should be merged into one!"

Good point about MWEPOS in MISC. In fact, ExtPos has been documented in the meantime, as the core group decided it will be in FEATS.

@sylvainkahane
Copy link
Contributor

In the French treebanks, ExtPos has been used for SYMs used as NOUNs (%, €, etc), CCONJs (&), etc.; for foreign words or combinations of letters and numbers (upos=X) used as PROPNs; for ADVs used as PRONs (a special construction of French, ADV de NOUN, where the ADV is the syntactic head and cans also be used alone as a pronoun); and some other rarer examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement features universal UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

7 participants