Меню
No. 2 (23) - 2024 / 2024-06-30 / Number of views: 70
Authors
Keywords
Link to DOI:
How to quote
In this work, we investigated the dependence of the work of the summarization model on the number of word stems in it. The work was performed on a synthetic summarization dataset for the Kazakh language. Taking the number of word stems as a metric of representativeness, an analysis of the quality of work of three summation models was performed depending on the number of word stems in the training dataset. To obtain three datasets, we divided the training dataset into three parts. BLEU estimates were obtained for each model on the test files. The experimental part of the work showed that the model with the largest number of stems shows the highest BLEU score. But the score does not directly depend on the number of word stems. Two models trained on datasets of different sizes show approximately the same scores.