Ijtimoiy-gumanitar fanlar

THEORETICAL AND PRACTICAL FOUNDATIONS OF CREATING A LEGAL QUESTION-AND-ANSWER SYSTEM IN THE UZBEK LANGUAGE BASED ON THE EXPERIENCE OF TURKIC LANGUAGES

natural language processing, legal question-and-answer systems, legal corpus, hybrid approach, large language model, artificial intelligence, semantic role, lemmatization, morphological tagging, tokenization.

Authors

The article analyzes NLP resources (BERTurk, KazNERD, UD Treebanks) formed in the Turkish and Kazakh languages and
proposes a theoretical and practical model for creating a legal question-and-answer (Legal QA, LQA) system in the Uzbek language
based on them. The architecture is based on a hybrid principle: IR Retrieval (BM25/dense), LLM Reasoning (transformers) and
Rule-based Constraint Filtering (normative checking). The agglutinative morphology of the Uzbek language, SOV order, formulaic
constructions of official-legal discourse and the integration of annotation requirements based on UD into LQA are covered. As a
result, an integrated architecture, corpus and annotation requirements roadmap for Uzbek LQA was developed.