Skip to content

UniversalDependencies/UD_Tatar-NMCTT

Repository files navigation

Summary

UD Tatar-NMCTT is a manually annotated corpus of the Tatar language based on the text from Tatar-Inform (tatar-inform.tatar), an online news website.

Introduction

UD Tatar-NMCTT is a corpus of the Tatar language, manually annotated by Chihiro Taguchi under the project "NAIST Multilingual Corpus" at Nara Institute of Science and Technology, Japan. The text is taken from the online news website Tatar-Inform. The articles contain a wide variety of genres, including politics, health, incidents, etc. Upon citing the text, it is recommended to show the source link of the article, given the Russian federal law stipulating that all mass medias citing an article have to show the link to the corresponding source article. The link is available as meta data in the corpus.

Acknowledgments

This contribution to UD were never possible without generous cooperation by Zilya Mubarakshina (Tatar-Inform). The building of this corpus is funded by CICP of Nara Institute of Science and Technology.

References

Changelog

  • 2022-11-15 v2.11
    • Added the new feature: VerbForm=NomAg for -учы verb forms.
    • Added the new deprel: advmod:emph
    • Added VerbForm=Conv for conditional adverbial form (along with Mood=Cnd)
  • 2022-05-15 v2.10
  • 2021-11-15 v2.9
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.9
License: CC BY-SA 4.0
Includes text: yes
Genre: nonfiction news
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Taguchi, Chihiro
Contributing: here
Contact: c.taguchi@sms.ed.ac.uk
===============================================================================