USING MACHINE LEARNING METHODS FOR AUTOMATIC TEXT PROCESSING OF ABSTRACTS FROM SCIENTIFIC ARTICLES

This paper examines the application of machine learning methods for automatic text processing of abstracts from scientific articles. With the increasing volume of scientific information, researchers are faced with the problem of information overload, which makes it difficult to find and analyze relevant materials. To solve this problem, we are implementing machine learning algorithms such as the Support vector Machine (SVM) method and word representation using Word2Vec, which allows us to effectively classify annotations and extract key information. In the process, we collect data from open databases. Annotations go through preprocessing stages, including tokenization, lemmatization, and deletion of stop words. Then we use Word2Vec to convert annotation texts into vector representations, which serve as input data for the SVM model. The effectiveness of the models is evaluated using accuracy, completeness and F1-measure metrics. The results show that the integration of SVM and Word2Vec significantly improves the quality of annotation classification, which makes it possible to speed up the process of searching for scientific information. The work highlights the potential of using machine learning methods to automate the processing of scientific texts and suggests areas for further research, including the use of more complex models such as transformers. This methodology can become the basis for the development of effective tools that facilitate faster knowledge sharing in the scientific community.

Versions

2025-03-31 (2)
2025-03-31 (1)