Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Ukrainian NER and UD datasets #3069

Merged
merged 2 commits into from
Jan 27, 2023

Conversation

lukasgarbas
Copy link
Collaborator

Added Ukrainian NER dataset from lang-uk project. Fixed splits (train and test) are taken from lang-uk/flair-ner:

from flair.datasets import NER_UKRAINIAN

corpus = NER_UKRAINIAN()

print(corpus)
# Corpus: 7886 train + 876 dev + 4045 test sentences

print(corpus.train[161])  # sentence example
# "І СхідSide втратив Дудка ..." → ["СхідSide"/ORG, "Дудка"/PERS]

And Ukrainian Universal Dependency Treebank from UniversalDependencies:

from flair.datasets import UD_UKRAINIAN

corpus = UD_UKRAINIAN()

print(corpus)
# Corpus: 5521 train + 673 dev + 898 test sentences

print(corpus.train[9])  # sentence example
# "Бо самою авторкою всі акценти розставлено зовсім очевидно." → ["Бо"/бо/SCONJ/Css/mark, ...

@lukasgarbas
Copy link
Collaborator Author

I also trained a few models on Ukrainian NER:

embeddings method parameters dev F1 (micro) test F1 (micro)
electra-base-ukrainian fine_tune() lr: 5e-5, batch: 16 95.02 88.39
Flair uk-forward, Flair uk-backward train() default 86.20 81.42
electra-base-ukrainian train() default, fine_tune: False, layers: 'all', layer_mean: True 92.87 87.38
electra-base-ukrainian, Flair uk-forward, Flair uk-backward train() default, fine_tune: False, layers: 'all', layer_mean: True 94.22 88.61

@alanakbik
Copy link
Collaborator

@lukasgarbas thanks for adding these datasets, and for posting these numbers!

@alanakbik alanakbik merged commit ff74a9f into flairNLP:master Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants