RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION

T.R  Zhabaev; U.A. Tukeyev

doi:10.58805/kazutb.v.2.23-366

Information and communication and chemical technologies

No. 2 (23) - 2024 / 2024-06-30 / Number of views: 103

RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION

Authors

T.R Zhabaev⁺⁻
U.A. Tukeyev ⁺⁻

Al-Farabi Kazakh National University

Keywords

neural language modeling, NLP, text summarization, Kazakh language, representativity, synthetic datasets

Back

PDF (RU)

Link to DOI:

https://doi.org/10.58805/kazutb.v.2.23-366

How to quote

Zhabaev, T. ., and U. Tukeyev. “RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION”. Vestnik KazUTB, vol. 2, no. 23, June 2024, doi:10.58805/kazutb.v.2.23-366.

ACM ACS APA ABNT Chicago Harvard IEEE MLA Turabian Vancouver

Abstract

In this work, we investigated the dependence of the work of the summarization model on the number of word stems in it. The work was performed on a synthetic summarization dataset for the Kazakh language. Taking the number of word stems as a metric of representativeness, an analysis of the quality of work of three summation models was performed depending on the number of word stems in the training dataset. To obtain three datasets, we divided the training dataset into three parts. BLEU estimates were obtained for each model on the test files. The experimental part of the work showed that the model with the largest number of stems shows the highest BLEU score. But the score does not directly depend on the number of word stems. Two models trained on datasets of different sizes show approximately the same scores.

Information and communication and chemical technologies

RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION

neural language modeling, NLP, text summarization, Kazakh language, representativity, synthetic datasets

Zhabaev, T. ., and U. Tukeyev. “RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION”. Vestnik KazUTB, vol. 2, no. 23, June 2024, doi:10.58805/kazutb.v.2.23-366. ACM ACS APA ABNT Chicago Harvard IEEE MLA Turabian Vancouver

Abstract

Zhabaev, T. ., and U. Tukeyev. “RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION”. Vestnik KazUTB, vol. 2, no. 23, June 2024, doi:10.58805/kazutb.v.2.23-366.

ACM ACS APA ABNT Chicago Harvard IEEE MLA Turabian Vancouver