This guideline outlines the tagsets and the annotation scheme that are used to
manually annotate the lexicon. All lexicon annotations that are submitted to
//src/analyzer/lexicon
should follow this guideline.
Lexicon files are grouped into two:
- Base lexicon files that are provided by the Project Steward(s) in
base lexicon directory (
//src/analyzer/lexicon/base
). - Contrib lexicon files that are submitted by the contributors to this library
and reviewed by the Project Steward(s) upon submission in
contrib lexicon directory (
//src/analyzer/lexicon/contrib
).
Base lexicon contains 66 files with annotations on 47,202 words. Lexicon entries
are grouped into different files according to their annotated coarse and fine
part-of-speech tag (e.g. adjective_jj.tsv
contains the annotations for
base adjectival root forms; see Annotating part-of-speech for the
definition of coarse and fine part-of-speech).
Lexicon files are TSVs (tab separated values). They should be structured according to the following conventions:
- Lexicon files should use '.tsv' file extension.
- Lexicon files should be named as
[coarse_pos_tag]_[fine_pos_tag].tsv
, where[coarse_pos_tag]
and[fine_pos_tag]
stands for the annotated part-of-speech of the lexicon entries that can be found in that file. - Lexicon file that is named as
[coarse_pos_tag]_[fine_pos_tag].tsv
should only contain annotations for lexicon entries that are annotated for[fine_pos_tag]
. - Submitted lexicon files should end with an empty line (
\n
). - Each lexicon file should contain a header in its first row.
- Headers should have 5 columns (
tag
,root
,morphophonemics
,features
andis_compound
exactly in this order). - Rows beneath the header should include annotated lexicon entries.
- Every lexicon entry annotation should have 5 columns (each of which should contain the annotated value for the respective field in the header).
Definition of the annotated fields are as follows:
tag
(string): part-of-speech tag of the root form (see Annotating part-of-speech).root
(string): stem of the word (see Annotating root form).morphophonemics
(string): morphophonemic irregularities of the root form, which determines the surface form of the root after affixation of suffix morphemes (see Annotating morphophonemic irregularities).features
(string): morphological feature tags, which are used for marking optional morphosemantic or morphosyntactic tags to be used in building a morphological analyzer for Turkish (see Annotating features).is_compound
(bool): marks nominal root forms that end with a Turkish compounding marker+SH
(see Annotating compound roots).
Every lexicon entry should have the part-of-speech annotated in the tag
field.
To annotate part-of-speech;
- decide on the root form by referring to Annotating root form.
- refer to the table of annotation tags and their descriptions and use the annotation tag that applies to the root form as its part-of-speech annotation.
Coarse part-of-speech tagset is a broad classification of the lexical categories. It is not used in lexicon annotations but provided here as a reference syntactic categorization to help develop a Turkish morphotactic model (since agglutination patterns for each coarse part-of-speech vary drastically).
Below listed 15 tags compose our coarse part-of-speech tagset.
Coarse Tag | Description |
---|---|
ADJ | Adjective |
ADP | Adposition |
ADV | Adverb |
AFFIX | Affix |
CONJ | Conjunct |
DET | Determiner |
EXS | Existential |
NOUN | Noun |
NUM | Number |
ONOM | Onomatopoeia |
PRON | Pronoun |
PRT | Particle |
PUNCT | Punctuation |
VERB | Verb |
X | Other |
Fine part-of-speech tagset defines the lexical categories that can occur in morphological analyses that are output by a morphological analyzer implementation for Turkish. It is not used in the lexicon annotations.
This tagset provides a finer distinction of syntactic subcategorization in comparison to coarse part-of-speech tagset. Each fine part-of-speech category can be mapped to one of the coarse part-of-speech categories. Therefore, there is a one-to-many mapping from coarse part-of-speech tags to fine part-of-speech.
Below listed 47 tags compose our fine part-of-speech tagset.
Note that some tags are specified as derived categories (marked as Yes under Is Derived Category? column). Morphologically derived categories can only occur in a morphological analysis as a result of affixation of a derivational morpheme, therefore no cross classified lexicon will contain entries with those fine part-of-speech tags. Such morphological derivations and inventory of derivational morphemes are defined as part of the morphotactic model.
Coarse Tag | Fine Tag | Is Derived Category? | Description |
---|---|---|---|
ADJ | JJ | No | Adjective |
ADJ | VJ | Yes | Verb in participle form |
ADP | IN | No | Postposition |
ADV | CRB | Yes | Converb |
ADV | RB | No | Adverb |
ADV | WRB | No | Interrogative adverb |
AFFIX | PFX | No | Prefix |
CONJ | CC | No | Coordinating conjunction |
DET | DT | No | Determiner |
DET | PDT | No | Predeterminer |
DET | WDT | No | Wh-determiner |
EXS | EX | No | Existential |
NOUN | ADD | No | Electronic address (e-mail or URL) |
NOUN | NN | No | Common noun |
NOUN | NNP | No | Proper noun |
NOUN | VN | Yes | Verbal noun (head of a noun clause) |
NUM | CD | No | Cardinal number |
ONOM | DUP | No | Onomatopoeic |
PRON | PRD | No | Demonstrative pronoun |
PRON | PRF | Yes | Morphologically derived pronoun |
PRON | PRI | No | Indefinite pronoun |
PRON | PRP | No | Personal pronoun |
PRON | PRP$ | No | Possessive pronoun |
PRON | PRR | No | Reflexive pronoun |
PRON | WP | No | Wh-pronoun |
PRT | EP | No | Final particle |
PRT | OP | No | Coordinative article |
PRT | RPC | No | Clitic particle |
PRT | RPNEG | No | Negation particle |
PRT | RPQ | No | Question particle |
PUNCT | . | No | Terminal punctuation such as .!? |
PUNCT | , | No | Comma and comma-like punctuation |
PUNCT | : | No | Colon and semi-colon |
PUNCT | ( | No | Left bracket punctuation |
PUNCT | ) | No | Right bracket punctuation |
PUNCT | `` | No | Open quotation mark and similar punctuation |
PUNCT | ' | No | Close quotation mark and other similar punctuation |
PUNCT | - | No | Hyphen, dashes, and similar punctuation |
VERB | NOMP | No | Nominal predicate |
VERB | VB | No | Verb |
X | FW | No | Foreign word whose meaning is not known and cannot be inferred |
X | GW | No | Word parts separated due to bad tokenization |
X | LS | No | List symbols |
X | NFP | No | Non-final punctuation, including emoticons and multi-symbol tokens |
X | SYM | No | Symbol |
X | UH | No | Interjection or hesitation |
X | XX | No | Total garbage |
Below listed 75 tags are used in lexicon annotations. Every lexicon entry should
have one of the part-of-speech tags listed under the Annotation Tag column in
its tag
field.
In order to keep the lexicon size minimal we introduce a concept called cross classification of lexicon entries.
It assumed that for every lexicon entry that is annotated with certain
annotation tags, the final lexicon should contain identical lexicon entries to
that annotated entry which only differ in part-of-speech (below table presents
the parts-of-speech which can be cross classified under the
Cross Classifies As column). Therefore, our lexicon is compressed. To obtain
the fully expanded lexicon a pre-processing stage is required in which new
lexicon entries are added for cross classification parts-of-speech (e.g. a
lexicon entry annotation which has the annotation tag PDT
triggers creation of
two lexicon entries where one has the tag NOMP
and the other has the tag
PDT
, see //src/analyzer/lexicon/parser.py).
Coarse Tag | Annotation Tag | Cross Classifies As | Description |
---|---|---|---|
ADJ | JJ | JJ, NN, NOMP, PRI, RB | Adjective |
ADJ | JJN | JJ, NN, NOMP | Adjective which can also be used as noun and nominal predicate |
ADP | IN | IN, NN, NOMP | Postposition |
ADV | RB | RB | Adverb |
ADV | RB-TEMP | NN-TEMP, NOMP, RB | Temporal adverb |
ADV | WRB | NOMP, WRB | Interrogative adverb |
AFFIX | PFX | PFX | Prefixes |
CONJ | CC | CC | Coordinating conjunction |
DET | DT | DT, NOMP, PRI | Determiner |
DET | PDT | NOMP, PDT | Predeterminer |
DET | WDT | NOMP, PRI, WDT | Wh-determiner |
EXS | EX | EX, NOMP-CASE-BARE | Existential |
NOUN | ADD | ADD, NOMP-WITH-APOS | Electronic address (e-mail or URL) |
NOUN | NN | NN, NOMP | Common noun |
NOUN | NN-ABBR | NN, NOMP-WITH-APOS | Common noun abbreviation whose root might be separated from suffixes with an apostrophe |
NOUN | NN-ABBR-APOS | NN, NOMP-APOS | Common noun abbreviation whose root might be separated from suffixes with an apostrophe only when used as a nominal predicate |
NOUN | NN-TEMP | NN | Common noun that denotes temporality |
NOUN | NNP | NNP, NOMP-WITH-APOS | Proper noun |
NOUN | NNP-ABBR | NNP, NOMP-WITH-APOS | Proper noun abbreviation |
NUM | CD | CD, NN, NOMP-WITH-APOS | Cardinal number |
NUM | CD-DIST | NN, NOMP-WITH-APOS | Distributive number |
NUM | CD-ORD | NN, NOMP-WITH-APOS | Ordinal number |
ONOM | DUP | DUP | Onomatopoeic |
PRON | PRD | NOMP, PRD | Demonstrative pronoun |
PRON | PRD-PNON | NOMP-PNON, PRD | Demonstrative pronoun whose root is marked for person-number and none possessiveness |
PRON | PRD-PNPOSS | NOMP-PNPOSS, PRD | Demonstrative pronoun whose root is marked for person-number and whose case markers are always realized as if it is marked for posessiveness |
PRON | PRI | NOMP, PRI | Indefinite pronoun |
PRON | PRP | NOMP-PN, PRP | Personal pronoun |
PRON | PRP-CASE | NOMP-CASE-MARKED, PRP | Personal pronoun whose root is marked for person-number, none possessiveness and case |
PRON | PRP-IRR | NOMP-PNON, PRP | Personal pronoun whose root is marked for person-number and none posessiveness, which cannot be inflected for dative case |
PRON | PRP$ | NOMP-PNON, PRP$ | Possessive pronoun |
PRON | PRR | NOMP, PRR | Reflexive pronoun |
PRON | WP | NOMP, WP | Wh-pronoun |
PRT | EP | EP | Final particle |
PRT | OP | OP | Coordinative article |
PRT | RPC | RPC | Clitic particle |
PRT | RPNEG | NOMP-CASE-BARE, RPNEG | Negation particle |
PRT | RPQ | NOMP-CASE-BARE, RPQ | Question particle |
PUNCT | PUNCT-6 | . | Terminal punctuation such as .!? |
PUNCT | PUNCT-4 | , | Comma and comma-like punctuation |
PUNCT | PUNCT-7 | : | Colon and semi-colon |
PUNCT | PUNCT-2 | ( | Left bracket punctuation |
PUNCT | PUNCT-3 | ) | Right bracket punctuation |
PUNCT | PUNCT-8 | `` | Open quotation mark and similar punctuation |
PUNCT | PUNCT-1 | ' | Close quotation mark and other similar punctuation |
PUNCT | PUNCT-5 | - | Hyphen, dashes, and similar punctuation |
VERB | NOMP | NOMP | Nominal predicate |
VERB | NOMP-APOS | NOMP | Nominal predicate whose root is always separated from suffixes with an apostrophe |
VERB | NOMP-CASE-BARE | NOMP | Nominal predicate whose root is caseless but marked for person-number and none possessiveness |
VERB | NOMP-CASE-MARKED | NOMP | Nominal predicate whose root is marked for person-number, none possessiveness and case |
VERB | NOMP-PN | NOMP | Nominal predicate whose root is marked for person-number |
VERB | NOMP-PNON | NOMP | Nominal predicate whose root is marked for person-number and none possessiveness |
VERB | NOMP-PNPOSS | NOMP | Nominal predicate whose root is marked for person-number and whose case markers are always realized as if it is marked for posessiveness |
VERB | NOMP-WITH-APOS | NOMP | Nominal predicate whose root might be separated from suffixes with an apostrophe |
VERB | VB-HL-AR-DHR | VB | Verb that takes +Hl as passive, +Ar as aorist tense and +DHr as causative marker |
VERB | VB-HL-AR-HR | VB | Verb that takes +Hl as passive, +Ar as aorist tense and +Hr as causative marker |
VERB | VB-HL-AR-HT | VB | Verb that takes +Hl as passive, +Ar as aorist tense and +Ht as causative marker |
VERB | VB-HL-AR-NO | VB | Verb that takes +Hl as passive and +Ar as aorist tense marker, but does not take a causative marker |
VERB | VB-HL-AR-T | VB | Verb that takes +Hl as passive, +Ar as aorist tense and +t as causative marker |
VERB | VB-HL-HR-DHR | VB | Verb that takes +Hl as passive, +Hr as aorist tense and +DHr as causative marker |
VERB | VB-HL-HR-NO | VB | Verb that takes +Hl as passive and +Hr as aorist tense marker, but does not take a causative marker |
VERB | VB-HL-HR-T | VB | Verb that takes +Hl as passive, +Hr as aorist tense and +t as causative marker |
VERB | VB-HN-AR-DHR | VB | Verb that takes +Hn as passive, +Ar as aorist tense and +DHr as causative marker |
VERB | VB-HN-HR-DHR | VB | Verb that takes +Hn as passive, +Hr as aorist tense and +DHr as causative marker |
VERB | VB-HN-HR-NO | VB | Verb that takes +Hn as passive and +Hr as aorist tense marker, but does not take a causative |
VERB | VB-HN-HR-T | VB | Verb that takes +Hn as passive, +Hr as aorist tense and +t as causative marker |
VERB | VB-ON-OR-DHR | VB | Verb that takes +Hn as passive, +r as aorist tense and +DHr as causative marker |
VERB | VB-ON-OR-T | VB | Verb that takes +Hn as passive, +r as aorist tense and +Dt as causative marker |
X | FW | FW | Foreign word whose meaning is not known and cannot be inferred |
X | GW | GW | Word parts separated due to bad tokenization |
X | LS | LS | List symbols |
X | NFP | NFP | Non-final punctuation, including emotico nsand multi-symbol tokens |
X | SYM | SYM | Symbol |
X | UH | UH | Interjection or hesitation |
X | XX | XX | Total garbage |
Every lexicon entry should have the root form annotated in the root
field.
To annotate root forms;
- decide on the part-of-speech of the root by referring to Annotating part-of-speech.
- strip off all the suffixes (inflectional and derivational morphemes) from the word form, given a morphotactic model that defines the inflections and derivations for part-of-speech and the corresponding morpheme inventory.
- remaining string is the root form.
- if the root form ends with a compounding marker
+SH
leave it as a part of the root form annotation (e.g. the word ahçıbaşınınki should have ahçıbaşı as its root form annotation, not ahçıbaş).
Before you read this section it is strongly advised to refer to Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and linguistic computing, 9(2), 137-148 for an outline of the Turkish morphophonemic processes.
Every lexicon entry should have the morphophonemic irregularities annotated in
the morphophonemics
field.
To annotate morphophonemic irregularities;
- go through the list of rules presented in below sections and decide on the morphophonemic irregularity annotation string by referring to those that apply to the root form.
- if more than one of these rules apply to the root form, what all applying annotation rules dictate should be incorporated in the final morphophonemic annotation string.
- if none of these rules apply to the root form, use
~
as the morphophonemic irregularity annotation string.
The voicing exception applies to roots whose final voiceless consonant fails to
get voiced in spite of the affixation of a suffix that starts with a vowel. This
exception applies only to sounds that are [-voiced][+plosive]
(e.g. { p, t, k, t, ʃ }). Voiceless continuants
{ f, s, ʃ } never undergo voicing anyway.
Root final voiceless plosive k is always assumed to realize as ğ when a suffix starting with a vowel is affixed. Sometimes root final voiceless plosive k realizes as g instead of ğ only if it is preceded by n.
If the root ends with voiceless plosive k which is left unchanged (and not
realized as ğ or g) after a suffix starting with a vowel is affixed,
then the root final velar stop k is annotated with the markup K
.
For the sake of consistency, even there are no suffixes which start with a vowel that can be affixed to the root in the affix inventory of the morphotactic model for the annotated part-of-speech of the root, this rule still applies (e.g. as seen in the adjectival examples below).
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
ADJ | ak | aK |
ak |
ADJ | tok | toK |
tok |
NOUN | mercanköşk | mercanköşK |
mercanköşk-e |
NOUN | meşk | meşK |
meşk-i |
NOUN | şark | şarK |
şark-ı |
NOUN | türk | türK |
türk-e |
VERB | bük | büK |
bük-er |
VERB | kalk | kalK |
kalk-ar |
VERB | sok | soK |
sok-ar |
Root final voiceless plosive p (or t) is always assumed to be left unchanged after a suffix starting with a vowel is affixed.
If the root ends with the voiceless plosive p (or t) which is
realized as voiced b (or resp. d) when a suffix starting with a
vowel is affixed, then the root final voiceless plosive p (or t)
should be annotated with a succeeding ~
annotation markup.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | dolap | dolap~ |
dolab-ın |
NOUN | mikrop | mikrop~ |
mikrob-u |
NOUN | kanat | kanat~ |
kanad-ının |
NOUN | tehdit | tehdit~ |
tehdid-i |
VERB | bahset | bahset~ |
bahsed-iyor |
VERB | git | git~ |
gid-ecek |
Root final voiceless plosive ç is always assumed to realize as voiced c when a suffix starting with a vowel is affixed.
If the root ends with the voiceless plosive ç which is left unchanged (and
not realized as c) after a suffix starting with a vowel is affixed, then
the root final voiceless plosive ç is annotated with the markup Ç
.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | göç | göÇ |
göç-e |
NOUN | sandviç | sandviÇ |
sandviç-in |
NOUN | suç | suÇ |
suç-u |
NUM | üç | üÇ |
üç-er |
VERB | aç | aÇ |
aç-ıl |
VERB | seç | seÇ |
seç-er |
The lateral l has allophones when it occurs in the root final position after back vowels.
Velar l (back) | Palatal l (front) |
---|---|
sol | sol (anahtarı) |
okul | usul |
araba | kalp |
bor | golf |
When roots that have a palatalized root final l are followed by a suffix that starts with a vowel, the surface form of the suffixes are resolved as if the last syllable of the root has a front vowel.
Certain roots that have the back rounded vowel o as its last vowel followed by a l cause the first back vowel of the affixed suffix to be realized as a front vowel.
The last vowel of those roots should be annotated with the markup %
.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | mentol | ment% l |
mentol-lü |
NOUN | metropol | metrop% l |
metropol-de |
NOUN | monokl | monok% l |
monokol-e |
Certain roots that have the back unrounded vowel a as its last vowel followed by a l cause the first back vowel of the affixed suffix to be realized as a front vowel.
The last vowel of those roots should be annotated with the markup {
.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
ADJ | biyomedikal | biyomedik{ l |
biyomedikal-de |
NOUN | ideal | ide{ l |
ideal-i |
NOUN | ihtimal | ihtim{ l |
ihtimal-iyle |
In addition, if the root ends with a cluster that contains a back vowel and
{ r, t, b, d }, same phenomena could be observed. The
last vowel of such roots should also be annotated with the markup {
.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | dikkat | dikk{ t |
dikkat-i |
NOUN | harf | hr{ f |
harf-e |
NOUN | hat | h{ t~" |
had-dinin |
NOUN | saat | sa{ t |
saat-e |
Certain roots that have the back rounded vowel u as its last vowel followed by a l cause the first back vowel of the affixed suffix to be realized as a front vowel.
The last vowel of those roots should be annotated with the markup }
.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
ADJ | mesul | mes} l |
mesul-ü |
NOUN | ampul | amp} l |
ampul-e |
NOUN | duhul | duh} l |
duhul-de |
Certain roots that have the long â as its last vowel followed by a l cause the first back vowel of the affixed suffix to be realized as a front vowel.
The last vowel of those roots should be annotated with the markup [
.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | eşkâl | eşk[ l |
eşkâl-i |
NOUN | hâl | h[ l |
hâl-den |
NOUN | işkâl | işk[ l |
işkâl-inin |
For certain verbal roots, root final front unrounded open vowel e is realized as close i when a suffix starting with a vowel is affixed.
The final front unrounded vowel e of those roots should be annotated with
the E
annotation markup.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
VERB | de | dE |
d-i-yecek |
VERB | ye | yE |
y-i-yor |
Last vowel of certain root forms are dropped when a suffix that starts with a vowel or a consonant that drops gets affixed.
The final vowel of those roots should be annotated with a succeeding ?
annotation markup.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | böğür | böğü? r |
böğr-üme |
NOUN | burun | buru? n |
burn-um |
NOUN | isim | isi? m |
ism-e |
VERB | çağır | çağı? r |
çağr-ıldı |
For some words, root final consonant { b, c, d, f, k, l, m, n, r, t, z } is duplicated when a suffix starting with a vowel is affixed.
The final consonant of those roots should be annotated with a succeeding "
annotation markup.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
ADJ | muhil | muhil" |
muhil-l-i |
ADJ | muhip | muhip~" |
muhib-b-in |
NOUN | af | af" |
af-f-ın |
NOUN | hak | hak" |
hak-k-a |
Certain nominal root forms that end with su
alter the initial consonant of
some suffixes (e.g. 3rd person possessive and genitive case inflection
morphemes) when they are affixed.
Root final su
of those root forms should be annotated with a succeeding ^
annotation markup.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | akarsu | akarsu^ |
akarsu-yun |
NOUN | yuzsu | yuzsu^ |
yuzsu-yu |
NOUN | su | su^ |
su-yu |
In case of code switching, foreign words could be used in Turkish sentences, while preserving their surface form and getting inflected according to the part-of-speech that they hold within the context.
Foreign word stems should be annotated by enclosing the last syllable of their
Turkish pronunciation enclosed in *
annotation markup.
Coarse Tag | Root | Morphophonemic Annotation | Example |
---|---|---|---|
NOUN | ABD | ABD*e* |
ABD-de |
NOUN | single | single*ıl* |
single-ının |
Certain lexicon entry annotations must be annotated with morphosyntactic and morphosemantic feature tags to represent the features that are intrinsically carried by the root itself without affixation of any inflectional or derivational morpheme.
Every lexicon entry should have features annotated in the features
field.
Every feature should be annotated in the form of
+[feature_category=feature_tag]
.
To annotate features;
- check whether annotated part-of-speech of the lexicon entry matches one of the part-of-speech listed in the tables for required and optional features (note that set of part-of-speech tags that require features annotation in both lists are disjoint).
- if so, annotate the lexicon entry with corresponding feature tags that apply for the root.
- if annotated part-of-speech of the lexicon entry does not match any of the
part-of-speech listed in the tables for required and optional features, use
~
as the features annotation string.
Note that if a lexicon entry is required to be annotated with more than one
feature, all applicable features should be annotated in sequence, in the form of
+[feature_category_1=feature_tag_1]+[feature_category_2=feature_tag_2]
(e.g.
lexicon entries that are annotated with PRP-CASE
part-of-speech tag should
take three feature tags in sequence, where the first feature is chosen from the
set of features tags { A1sg
, A2sg
, A3sg
, A1pl
, A2pl
, A3pl
} and the
second chosen from { Pnon
}, and so on).
Below listed 31 feature tags compose our features tagset.
Feature Category | Feature Tag | Description |
---|---|---|
PersonNumber | A1sg | 1st person singular |
PersonNumber | A2sg | 2nd person singular |
PersonNumber | A3sg | 3rd person singular |
PersonNumber | A1pl | 1st person plural |
PersonNumber | A2pl | 2nd person plural |
PersonNumber | A3pl | 3rd person plural |
Case | Abl | Ablative case marked |
Case | Acc | Accusative case marked |
Case | Dat | Dative case marked |
Case | Gen | Genitive case marked |
Case | Ins | Instrumental case marked |
Case | Loc | Locative case marked |
ComplementType | CAbl | (Postposition has) ablative case marked complement |
ComplementType | CAcc | (Postposition has) accusative case marked complement |
ComplementType | CBare | (Postposition has) caseless complement |
ComplementType | CDat | (Postposition has) dative case marked complement |
ComplementType | CFin | (Postposition has) finite complement |
ComplementType | CGen | (Postposition has) genitive case marked complement |
ComplementType | CIns | (Postposition has) instrumental case marked complement |
ComplementType | CNum | (Postposition has) numeric complement |
ConjunctionType | Adv | Adverbial conjunction |
ConjunctionType | Coor | Coordinating conjunction |
ConjunctionType | Par | Parallel conjunction |
ConjunctionType | Sub | Subordinating conjunction |
DeterminerType | Def | Definitive (determiner) |
DeterminerType | Dem | Demonstrative (determiner) |
DeterminerType | Dir | Directional (determiner) |
DeterminerType | Ind | Indefinite (determiner) |
Emphasis | True | Emphasis |
Possessive | Pnon | None possessive |
Temporal | True | Temporal |
All lexicon entries that are annotated with one of the below listed part-of-speech tags are required to be annotated with the corresponding feature tag(s).
Coarse Tag | Annotation Tag | Required Features |
---|---|---|
ADP | IN | +[ComplementType=CAbl|CAcc|CBare|CDat|CFin|CGen|CIns|CNum] |
ADV | RB-TEMP | +[Temporal=True] |
CONJ | CC | +[ConjunctionType=Adv|Coor|Par|Sub] |
DET | DT | +[DeterminerType=Def|Dem|Dir|Ind] |
NOUN | NN-TEMP | +[Temporal=True] |
PRON | PRD-PNON | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl]+[Possessive=Pnon] |
PRON | PRD-PNPOSS | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl] |
PRON | PRP | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl] |
PRON | PRP-CASE | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl]+[Possessive=Pnon]+[Case=Acc|Abl|Dat|Gen|Ins|Loc] |
PRON | PRP-IRR | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl]+[Possessive=Pnon] |
PRON | PRP-PRP$ | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl]+[Possessive=Pnon] |
VERB | NOMP-CASE-BARE | +[PersonNumber=A3sg]+[Possessive=Pnon]+[Bare] |
VERB | NOMP-CASE-MARKED | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl]+[Possessive=Pnon]+[Case=Acc|Abl|Dat|Gen|Ins|Loc] |
VERB | NOMP-PN | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl] |
VERB | NOMP-PNON | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl]+[Possessive=Pnon] |
VERB | NOMP-PNPOSS | +[PersonNumber=A1sg|A2sg|A3sg|A1pl|A2pl|A3pl] |
All lexicon entries that are annotated with one of the below listed part-of-speech tags might be annotated with the corresponding feature tag(s), if and only if the feature tag applies to the root.
Coarse Tag | Annotation Tag | Optional Features |
---|---|---|
ADJ | JJ | +[Emphasis=True] |
ADJ | JJN | +[Emphasis=True] |
ADV | RB | +[Emphasis=True|Temporal=True] |
Certain Turkish root forms end with a compounding marker +SH
, which is
ambiguous with a 3rd person possessive marker (e.g. common noun: atbalığı,
proper noun: Kırklareli).
Defining the morphotactics for those cases are particularly challenging. For example, if the word is a common noun which is inflected for person-number, the person-number morpheme occurs before the compounding marker which is essentially part of the root form. While implementing a morphological analyzer, this phenomena violates the assumption that all suffixes are sequentially affixed to the root.
Every lexicon entry should have this phenomena annotated in the is_compound
field.
To annotate roots that end with a compounding marker;
is_compound
field should be annotated asTRUE
if the root form ends with a compounding marker (otherwise,FALSE
).- the morphophonemic irregularity annotation string in the
morphophonemics
field should be the root form without the final compounding marker (e.g. the word atbalığı should have atbalık as its morphophonemics annotation). - if any of the morphophonemics processes that are described in the
Annotating morphophonemic irregularities applies to morphophonemic
annotation string yielded by (2), they should also be represented in the
morphophonemics annotations (e.g. adamkökü should have adamkö
K
as its morphophonemics annotation, not adamkök). - lexicon entries for root forms that do not end with a compounding marker should follow the rules defined in Annotating morphophonemic irregularities.