Ijtimoiy-gumanitar fanlar

N-GRAM YORDAMIDA TURG‘UN LISONIY BIRLIKLARNI ANIQLASH BOSQICHLARI

phraseological units, idioms, corpus linguistics, collocation, Uzbek language corpus, automatic identification.

Authors

  • Umidjon YODGOROV, Toshkent davlat o‘zbek tili va adabiyoti universiteti o‘qituvchisi, Uzbekistan

The article examines the scientific and methodological foundations for the automatic identification of stable linguistic units in the Uzbek language using the national text corpus. In this study, 2-5-word N-grams were selected based on statistical measures and classified as either phraseological or free word combinations through linguistic criteria and contextual models. The proposed approach achieved an accuracy of 90%, confirming that a substantial portion of highly associated combinations exhibit phraseological characteristics. The findings contribute to the development of automatic phraseological dictionaries and enhance the processing of multiword expressions in corpus linguistics and NLP systems.