UD Tatar-NMCTT is a manually annotated corpus of the Tatar language based on the text from Tatar-Inform (tatar-inform.tatar), an online news website.
UD Tatar-NMCTT is a corpus of the Tatar language, manually annotated by Chihiro Taguchi under the project "NAIST Multilingual Corpus" at Nara Institute of Science and Technology, Japan. The text is taken from the online news website Tatar-Inform. The articles contain a wide variety of genres, including politics, health, incidents, etc. Upon citing the text, it is recommended to show the source link of the article, given the Russian federal law stipulating that all mass medias citing an article have to show the link to the corresponding source article. The link is available as meta data in the corpus.
This contribution to UD were never possible without generous cooperation by Zilya Mubarakshina (Tatar-Inform). The building of this corpus is funded by CICP of Nara Institute of Science and Technology.
- Tatar-Inform. URL: https://www.tatar-inform.tatar/
- 2022-11-15 v2.11
- Added the new feature:
VerbForm=NomAg
for -учы verb forms. - Added the new deprel:
advmod:emph
- Added
VerbForm=Conv
for conditional adverbial form (along withMood=Cnd
)
- Added the new feature:
- 2022-05-15 v2.10
- 2021-11-15 v2.9
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.9 License: CC BY-SA 4.0 Includes text: yes Genre: nonfiction news Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Taguchi, Chihiro Contributing: here Contact: c.taguchi@sms.ed.ac.uk ===============================================================================