Morphological Analyzer of the Tatar Language

A morphological analyzer is a fundamental component of all fully-functional linguistic processors. The Tatar language morphological analyzer was developed using the technology of finite-state transducers from the Helsinki Finite State Transducer (HFST) toolkit, based on a two-level morphological model of the language (Koskenniemi model, 1983).
 
The morphological model used distinguishes 12 types of root affixes (parts of speech, pos), 81 derivational and inflectional affixes, plus 11 additional markers. The module also implements morphological disambiguation based on contextual rules and statistical-probabilistic models. Processing speed is approximately 10,000 tokens per second.
 
The morphological analyzer has been integrated into the Tatar National Corpus "Tugan Tel" system for grammatical annotation of word forms, is used in the University Information System RUSSIA (UIS RUSSIA) to support search functionality within the Russian-Tatar text collection, in the Yandex.Translate internet service to support machine translation for the Russian-Tatar language pair, and is applied in the educational process for the program 45.03.01 "Philology: Applied Philology" at Kazan Federal University.

Last updated: 8 December 2025, 16:47

All content on this site is licensed under
Creative Commons Attribution 4.0 International