Aniq fanlar

O‘ZBEK TILI UCHUN UNIVERSAL BOG‘LIQLIK DARAXTI KORPUSI ASOSIDA CHUQUR BI-AFFIN TOBELIK TAHLILINING NEYRON MODELI

Universal Dependencies, Uzbek language, dependency parsing, deep biaffine neural attention, treebank, NLP.

Authors

  • San’atbek Matlatipov Mirzo Ulug‘bek nomidagi O‘zbekiston milliy universiteti, Toshkent, O‘zbekiston, Uzbekistan

This article introduces a new Universal Dependencies (UD) treebank for the Uzbek language and a
dependency parser based on a deep biaffine neural attention mechanism. The corpus contains 686
sentences ( 7,800 tokens) from literary and popular-science texts, manually annotated with lemmas,
POS tags, morphological features and dependency relations, achieving inter-annotator agreement
above 95% for lemmatization and UPOS. On top of this gold-standard resource, we train and evaluate
a BiLSTM-based deep biaffine dependency parser implemented in the Stanza pipeline, obtaining
86.10% UPOS accuracy, 70.06% UFeats accuracy and, under gold morphology, 69.21% UAS and
53.21% LAS on the test set. The treebank and model define the first strong neural baseline for
dependency parsing in Uzbek and provide a mathematically grounded platform for further NLP
research on the language.