Website: tugantel.tatar
The National Corpus of the Tatar Language "Tugan Tel" is a linguistic resource of the modern literary Tatar language, intended for a wide range of users - linguists, specialists in Tatar, Turkic and general linguistics, typologists, teachers of the Tatar language, cultural figures, as well as for everyone who studies and is interested in the Tatar language. This electronic corpus is a fundamental component of a software and instrumental complex for research and development in Turkic languages.
The electronic corpus development project includes:
A specialized system for managing linguistic data, the "corpus manager" system, has been developed to manage the corpus data. This system is oriented towards working with Turkic languages but can also be used for working with electronic corpora of other languages. The corpus search system allows for searches by:
The search system also supports search for negative words (words to be excluded from the search), partial word search, search using logical formulas, and phrase search; thus, users can formulate complex queries specific to their research needs.
To enable quick and convenient extraction of search results for further processing in application software, Corpus API tools have been developed – a set of functional APIs that allow extracting and presenting corpus samples in various formats based on specified criteria.
The project is carried out within the framework of the State Program "Preservation, Study, and Development of the State Languages of the Republic of Tatarstan and Other Languages in the Republic of Tatarstan for 2014-2020".
The corpus includes Tatar texts of various genres with a total volume of over 180 million word usages (as of December 2019).
Last updated: 8 December 2025, 16:40