Abstract: The agricultural industry is currently experiencing a constant increase in the data obtained, the need for their quality processing and accurate calculations to support decision-making is increasing. Hence, the tasks related to the development of algorithms, methods and software for solving problems of analysis and processing of data in the field of agriculture using modern technologies and software are of particular relevance. The research paper provides the results of design and further implementation of software for agricultural indicators classification problem solving based on the complex application of data mining and machine learning methods. In the framework of the design part the functional and non-functional software requirements, the architecture and structure of the designed software, implementation technologies, and developing tools were included. The proposed large-scale software architecture consists of two parts: a user application based on the Java programming language and a kernel of R-scripts execution. The software design was defined to consist of five modules: data interaction tools, primary data processing, data analysis, automated selection of algorithm parameters, and «intelligent» module. To implement the software, it was proposed to use the technology stack: statistical computing language R for the realization of data analysis methods and Java to develop a graphical user interface to access the R data analysis functions. Another section provides a description of two developed software modules, namely: the module of primary data processing and the module of data classification. The module of primary data processing involves calculation of the main numerical features, the examination of the distribution laws based on the application of the Shapiro-Wilk, Anderson-Darling, Cramér-von Mises, Lilliefors consent criteria and tests, the analysis of relationships in the data using methods of correlation and variance analyses. The module of classification implemented methods of sampling to solve the problem of unbalanced data as well as models of classifiers: logistic regression, naive Bayes, discriminant analysis, neural network method (perceptron), decision trees. The ability to assess the accuracy of the obtained models using a set of metrics is realized. A case of solving the problem of classifying the level of crop infestation using a neural network (perceptron) is presented, the accuracy of classification was 0.73 on the test sample.
Index terms: software, requirements, architecture, structure, module, data analysis, classification, machine learning, R, Java.


Russia, 659305, Altai region, Biysk,
Trofimova Street, 27, room 404B
Tel. + 7-923-162-93-27
(executive secretary -
Golykh Roman Nikolayevich)
e-mail: info@s-sibsb.ru

The certificate