Information and communication and chemical technologies

No. 3 (20) - 2023 / 2023-09-30 / Number of views: 104

MATHEMATICAL APPARATUS FOR THE ANALYSIS OF SCIENTIFIC TEXTS: BAYESIAN PROBABILITY THEORY AND ITS IMPLEMENTATION

Authors

Kazakh University of Technology and Business
Esil University
Esil University
Kazakh University of Technology and Business

Keywords

параллельный анализ, теория вероятностей, теория вероятностей Байерса, научный текст, большие данные, неструктурированные данные, Apache Spark, распреденные вычисления, математический аппарат

Link to DOI:

https://doi.org/10.58805/kazutb.v.3.20-153

How to quote

Altynbek С. . ., Shuitenov Г. ., Turusbekova У. ., and Kubekova В. . “MATHEMATICAL APPARATUS FOR THE ANALYSIS OF SCIENTIFIC TEXTS: BAYESIAN PROBABILITY THEORY AND ITS IMPLEMENTATION”. Vestnik KazUTB, vol. 3, no. 20, Sept. 2023, doi:10.58805/kazutb.v.3.20-153.

Abstract

This article discusses the mathematical apparatus, namely the Bayesian probability theory, and its application for the analysis of scientific methods of texts. The main purpose of the study is to select optimal algorithms for the development of a future intelligent system for parallel analysis of unstructured data. To achieve this goal, the authors of the review are studying the Apache Spark distributed framework. They analyze the capabilities and functionality of this framework and propose optimal algorithms for analyzing unstructured data based on Bayes probability theory. This approach makes it possible to effectively analyze large amounts of textual information, isolate and classify it according to various parameters. The article also describes the advantages of using Apache Spark for parallel data analysis. The framework provides high processing speed and efficient use of resources, which makes it a suitable choice for analyzing large volumes of unstructured information. In conclusion, the authors of the article conclude that the use of the mathematical apparatus of Bayes probability theory and the Apache Spark distributed framework makes it possible to develop an intelligent system for parallel analysis of unstructured data, ensuring the efficiency and accuracy of text information analysis