Clusters partition algorithm for a self-organizing map for detecting resource-intensive database inquiries in a geo-ecological monitoring system

The paper presents the research, aimed at improving the efficiency of automated software system for geoecological monitoring of agro-industrial sector resources. An algorithm of clusters partition in a selforganizing map was developed, in order to detect resource-intensive inquiries to databases of agricultural resources and objects. The algorithm is based on using fuzzy inference. The corresponding software for implementing the proposed algorithm was created. The carried-out experimental research has demonstrated that this algorithm allows considerably increasing the correctness of detecting resource-intensive inquiries to databases in comparison with other similar software applications. The algorithm, presented in this paper, can be recommended for practical application in an automated software system for geo-ecological monitoring of agricultural resources and objects.


Introduction
At present, agriculture is a production branch, which generates a large amount of information data from various sources, such as farming machinery, satellites and drones, weather stations, agrochemical soil monitoring results. The data, obtained from these sources, are used for forecasting and increasing crop yield, saving time and resources, monitoring soil fertility, taking decisions by agro-industrial sector management authorities. An important tool of optimizing various spheres of business activities, including agro-industrial sector, is geoecological monitoring [1][2][3]. The large amounts of transmitted, processed and analyzed information about the state of agricultural resources and objects are the main cause of implementing automated software systems for geo-ecological monitoring [4]. The databases of such systems are intensively accessed by a large number of users for performing required operations. Among such transactional accesses, resource-intensive inquiries, the processing of which by the server takes a prohibitive amount of time, processor, disc and memory resources, should be detected [5]. The search and conversion of resource-intensive inquiries is performed by data base administrators (dba). For the search and identification of resource-intensive database inquiries various methods are used; the most efficient of them are based on neural networks, and among them, self-organizing map [6][7][8][9][10][11][12][13][14][15][16]. In the process of self-organizing map training, the weight values of its neurons are tuned in accordance with query parameters values, contained in the training set vectors. As a result, the clusters of neurons are formed, which have close weight values. To be able to use neuron clusters of a self -organizing map for classifying inquiries in terms of their resource-intensiveness, it is necessary to have data about these clusters' borders. The analysis has demonstrated that the known methods of clusters partition (k-means method, singling out clusters on the basis of unified distance matrix) has considerable limitations: the clusters should be of a certain shape; only the nearest local minimum is searched; the result depends on which initial cluster center or initial coordinate was selected; small distances in an actual cluster can be perceived as distances between different clusters. Based on the foregoing, it can be affirmed that developing an algorithm for clusters partition in a self-organizing map for detecting resource-intensive database inquiries in a geo-ecological monitoring system is a relevant scientific and technological problem.

Main part
A considerable cause of drawbacks in cluster borders detection by the k -means method is its imperfection in cases of a large number of parameters in objects to be clustered [17]. In this regard, it is proposed to reduce the weight values set of each neuron in a self -organizing map to generalized quantities. The visual analysis of trained self-organizing maps gives indistinct, fuzzy boundaries of the obtained clusters. In this case it is very difficult to accurately determine a certain numeric values range, by which the neuron's weight vector uniquely determines its membership in this or that cluster. In this regard, in order to determine neuron clusters' borders the fuzzy inference method can be applied, which is successfully used for solving various scientific and applied problems [18][19][20][21][22]. To be able to take into account all values of neurons' weights in the process of clusters partition, let us introduce the values of generalized neurons weights. To calculate value k Sthe generalized weight of neuron number kthe following fuzzy rules can be used: the values of y Y should be calculated by the formula: where, i vthe value of explained variance of the i -th parameter of query vectors.
The next stage is aggregation: The final stage of calculating k S value is defuzzification: As a result of performing the fuzzy inference, each neuron in a self -organizing map would have only one weight k S . In this case, in order to divide the neurons into C clusters and to determine clusters borders, the k-means algorithm for one-dimensional space can be used. In Figure. 3 and 4 the fragments of block diagram of an algorithm for clustering neurons in a self-organizing map for database inquiries classification are presented.  The algorithm consists of the following steps: Step 1. Input of source data: the values of neurons weights and the values of explained variance of query vectors parameters.
Step 2. In the values set of a self-organizing map's neurons weights the values Step 3. Neuron number 1 = k is selected.
Step 6. According to formulas (5) and (6) values y Y and iy b are calculated.
Step 7. For neuron number k according to formula (13) the generalized weight k S is calculated.
Step 8. The number of the selected neuron increments by 1.
Step 9. Fulfillment of the following condition is checked: If this condition is fulfilled, jump to step 4, if notjump to step 10.
Step 10. From the set of neurons, C neurons are selected, the weights of which would be initial values of cluster centers. At the first stage each cluster would contain only one neuron, the weight of which is selected as the center of this cluster.
Step 12. For the selected neuron number k proximity measures of its weight k S to each cluster's center values are calculated by the formula: where, c Scenter value of cluster number c ; c can take a value from 1 to C.
Step 13. The minimum value of k c  is determined. The selected neuron is included into the cluster, for which the k c  value is minimal.
Step 14. The center value of the cluster, into which the new neuron was included, is recalculated: where, l Sweight value of the l -th neuron of cluster number c ; c Lnumber of neurons, contained in cluster number c .
Step 15. The number of the selected neuron increments by 1.
Step 16. Fulfillment of the following condition is checked: If this condition is fulfilled, jump to step 12, if notjump to step 17.
Step 17. The contents of clusters, obtained in the current iteration, are saved. The clusters contents, obtained in the current and in the previous iterations, are compared. If the contents don't coincide, the jump to ste p 11 is performed, and the next iteration of neurons clustering is carried out. If the contents coincide, a decision, concerning the current neurons clustering, is taken. End of the algorithm. The above-mentioned algorithm is realized in the form of software for clustering sql-inquiries with the use of self-organizing map (som-clustering). The software applications were developed in python language on the basis of tensor calculations by means of tensorflow library. The carried-out analysis has demonstrated that the inquiries, incoming to databases, can be grouped into four clusters: 1st cluster -«heavy» resource-intensive inquiries, which are characterized with the intensive usage of system memory, processors, storage space and output channels in the process of their execution; 2nd cluster -«slow» resource-intensive inquiries, characterized with frequent execution and recompilation procedures; 3rd clusterspontaneous problematic inquiries, occurring due to occasional temporary resource shortages, caused by peak loads in the network, servers, processors, or resource shortages, occurring in the process of executing other inquiries; 4th clusterother, not resource-intensive (not problematic), inquiries. To classify various types of inquiries, over 500 experiments were carried out, in the course of which the developed method of detecting resource-intensive inquiries was used. The obtained findings were used for calculating the correctness indices of detecting resource-intensive inquiries. Correctness of detecting inquiries to databases was evaluated by two indicators: 1) detection probability of resource-intensive inquiries to databases; 2) probability of error detection of resource-intensive inquiries to databases. Detection probability value of resource-intensive inquiries to databases was calculated by the formula: where, det Qthe number of detected resource-intensive inquiries, incoming to databases; Q the total number of resource-intensive inquiries, incoming to databases.
To calculate the probability of error detection of resource-intensive database inquiries the following expression was used: where, err Qthe number of incoming inquiries to databases, which were erroneously classified as resourceintensive.

Results and discussion
Besides, a number of experiments in detecting resource-intensive inquiries by using up-to-date sql query tuning and monitoring applications tuning advisor [23,24] and oracle cost-based optimizer [25,26] were carried out. The findings of the research results are presented in Table 1. Analysis of the data, presented in tab. 1, demonstrates that the proposed algorithm allows increasing the probability of detecting resource-intensive database inquiries by 12.68% -15.27% and reducing the probability of erroneous detecting of resource-intensive database inquiries by 12.20% -24.39% in comparison with the currently used sql-inquiries tuning and optimization applications.

Conclusions
The presented research is aimed at solving the problem of clustering of a self -organizing map for detecting resource-intensive inquiries to databases of an automated software system for geo -ecological monitoring of agro-industrial resources. An algorithm for clusters partition in a self-organizing map, designed for database inquiries classification, was developed. The novelty of the algorithm consists in using fuzzy inference for calculating generalized weights of neurons. The usage of this algorithm allows determining clusters' borders for the subsequent detecting of resource-intensive database inquiries by means of neural network. The corresponding software was developed for implementing this algorithm.
The experimental research has demonstrated that using the proposed algorithm allows considerably increasing the probability of detecting resource-intensive inquiries to databases and reducing the probability of erroneous detecting of resource-intensive inquiries to databases in comparison with using other similar applications. Based on the foregoing, the algorithm, presented in this paper, can be recommended for using in practice for improving the efficiency of agro-industrial sector's automated geo-ecological monitoring software systems functioning.