IDENTIFYING EFFECTIVE MACHINE LEARNING ALGORITHMS FOR SENTIMENTAL ANALYSIS OF COMMENTS IN THE KAZAKH LANGUAGE

N.K.  Mukazhanov; L.Sh.  Cherikbayeva; A.M.  Kassenkhan; Zh.M.  Alibieva; M.  Turdalyuly

doi:10.58805/kazutb.v.4.25-426

Information and communication and chemical technologies

No. 4 (25) - 2024 / 2024-12-31 / Number of views: 119

IDENTIFYING EFFECTIVE MACHINE LEARNING ALGORITHMS FOR SENTIMENTAL ANALYSIS OF COMMENTS IN THE KAZAKH LANGUAGE

Authors

N.K. Mukazhanov
L.Sh. Cherikbayeva
A.M. Kassenkhan
Zh.M. Alibieva
M. Turdalyuly

Keywords

sentiment analysis, machine learning, deep learning, NLP, comments, dataset

Back

pdf (RU)

Link to DOI:

https://doi.org/10.58805/kazutb.v.4.25-426

How to quote

Mukazhanov, N. ., L. . Cherikbayeva, A. . Kassenkhan, Z. . Alibieva, and M. . Turdalyuly. “IDENTIFYING EFFECTIVE MACHINE LEARNING ALGORITHMS FOR SENTIMENTAL ANALYSIS OF COMMENTS IN THE KAZAKH LANGUAGE”. Vestnik KazUTB, vol. 4, no. 25, Dec. 2024, doi:10.58805/kazutb.v.4.25-426.

ACM ACS APA ABNT Chicago Harvard IEEE MLA Turabian Vancouver

Abstract

This article presents the results of an analysis of machine learning algorithms for sentimental data analysis in the Kazakh language, and as a result of the analysis, effective algorithms are determined. With the increasing volume of Kazakh-language content on social networks, news and online stores, the need for tools and methods for processing data in the Kazakh language has also increased in order to obtain valuable information about people's opinions and views. Therefore, the dataset used in the study was collected from real online stores and news sites. The volume of the collected data set is 1500 records, 80% of which were used for training the algorithms, and 20% for testing. For sentimental data analysis, machine learning algorithms such as logistic regression, multinomial naive Bayes, support vector machine (SVM), XGBoost and long short-term memory (LSTM) deep learning are considered. The study tested algorithms by increasing the dataset from 500 records to 1500 records, and various algorithm methods such as individual, ensemble, and augmented were implemented and tested. The results obtained during testing were presented in terms of algorithm accuracy.