Imbalanced data classification using support vector machine based on simulated annealing for enhancing penalty parameter

For pattern cataloguing and regression issues, the support vector machine (SVM) is an eminent and computationally prevailing machine learning method. It’s been effectively addressing several concrete issues across an extensive gamut of domains. SVM possesses a key aspect called penalty factor C. The choice of these aspects has a substantial impact on the classification precision of SVM as unsuitable parameter settings might drive substandard classification outcomes. Penalty factor C is required to achieve an adequate trade-off between classification errors and generalisation performance. Hence, formulating an SVM model having appropriate performance requires parameter optimisation. The simulated annealing (SA) algorithm is employed to formulate a hybrid method for evaluating SVM parameters. Additionally, the intent is to enhance system efficacy to obtain the optimal penalty parameter and balance classification performance at the same time. Our experiments with many UCI datasets indicate that the recommended technique could attain enhanced classification precision.


Introduction
Structural risk minimization (SRM) algorithm based, a new classification approach was presented by Cortes and Vapnik [1], generally known as Support vector machine (SVM). The algorithm was quickly deployed for many cataloguing jobs because of its efficiency in identifying handwritten characters, wherein it outclassed accurately trained neural networks. SVMs carried out effective classification in other uses like time series estimation, bioinformatics and pattern classification. Burges (1998) issued an inclusive tutorial about the SVM classifier algorithm. SVM can process massive feature spaces since SVM training is performed so that classified vector dimensions do not exert undue influence on SVM performance compared to the influence of a typical classifier. Therefore, it is said to be particularly effective in bigger classification problems. Furthermore, SVMbased classifier is said to have sound generalization attributes as against traditional classifiers, because SVM classifier training requires a systematic reduction in misclassification risk. On the other hand, conventional classifiers are typically trained to minimise empirical risk [2]. Numerous techniques have been proposed to address this SVM issue. Huang and Wang [3] recommended a genetic algorithm (GA) method for constraint refinement [4]. Ren and Bai [5] too offered twin methodologies for constraint refinement in SVM: particle swarm optimization (PSO) SVM and GA-SVM. A classifier using the hybrid ant colony optimisation (ACO) technique concurrently identifies the best possible feature subset and works on SVM parameter optimisation; this technique was formulated by Huang [6]. Also, Lin et al. [7] worked on parameter value computation and SVM feature selection using the Simulated Annealing concept. The simulated annealing algorithm optimises an SVM by addressing the issue of the system being stuck at local optima. It works by facilitating non-optimal steps to be selected based on probability values. The technique was outlined separately by Kirkpatrick et al. [8]. Simulated annealing chooses an explanation in each repetition via examining whether neighbour explanation is preferable to present scenario. If yes, the novel explanation would be acknowledged unreservedly. Nevertheless, if the proximal solution is not better, acceptance relies on probability values considering its difference from the proximal solution; the present solution value is also considered. This paper assesses the combination of SVM and Simulated Annealing to determine optimal parameters for the betterment of SVM accuracy [9]. Our experimental outcomes have proved that the recommended technique has greater precision in comparison to SVM. The remainder of the paper is organised as follows. Sect. 2 contains a brief overview of the literature. Sect. 3 presents the proposed SA-SVM algorithm. The experimental results are presented in Section 4, and the findings are presented in Section 5.

Related work
This field has witnessed over two decades of continuous research; however, imbalanced data-based learning is still researched extensively. Initially, binary problems with skewed distributions were the focus of this domain. Over time, the concept has grown extensively. Advances in data mining, machine learning, and big data capabilities have provided insightful information concerning imbalanced learning. Such advances have also led to numerous challenges being discovered. Hybrid techniques are now getting increased attention, and algorithms are being continuously made better. Multidimensional time series

CPSO-g-SVRM
The e-insensitive loss function is replaced by the Gaussian loss function for SVR in order to reduce the influence of noise on regression values. The suggested g-SVRM technique is used, and its parameters are optimised using the chaotic PSO technique. Xueying Zhang [14] Speech recognition system PSO and SVM This research proposes using particle swarm optimisation to ascertain optimal parameters used for a speech recognition system based on SVM. Yaxiong Zhang [15] [16] Classification GA and SVM SVM was used to augment a predictive modelling method that relied on identifying the right combination of training and test data by employing the sphere exclusion technique and SVM parameter optimisation using the GA. M.R. Gauthama Raman [17] Classification (HG -GA) and SVM A robust intrusion detection mechanism has been suggested using hypergraph-based Genetic Algorithm (HG -GA) for SVM feature identification and parameter regulation.

Methodology
This section outlined the recommended technique to categorise the imbalanced dataset by using Simulated Annealing and SVM. The dataset as well as the performance metrics utilised to corroborate the recommended technique are also deliberated.

Data set
The datasets originate from the UCI Machine Learning Repository which is accessible at the link: http://www.ics.uci.edu/~mlearn/MLRepository.html. Thorough information comprising the number of attributes, data size, data attributes and class distribution can be observed in Table 2. Initially, just the ten data sets that comprise binary imbalanced data (Australian, heart C, heart-statlog, ionosphere, liver, Pima, hepatitis, breast cancer, kidney and German credit data) are utilised. However, the experimental outcomes indicate that our recommended approach has sound ability of dealing with imbalanced data.

Simulated annealing
Simulated annealing (SA) is a universal search algorithm that first recommended by Metropolis et al. [18] and then made popular by Kirkpatrick et al. [8]. SA borrows its elementary understanding from metallurgy. Metal molecules slowly attain a low-energy state and witness gradual crystallisation with a reduction in temperature. Every grain is expected to ultimately have lowest energy if the metal is initially heated to an adequately high temperature and cooling is kept gradual. Metropolis proposed a technique that enhances search results and also prevents the local optima problem. Furthermore, the "cooling" process, similar to metal cooling, facilitates SA to ultimately converge to results that indicate global optima. Parameter optimisation is typically performed using techniques like analytical gradient, genetic algorithms, Monte Carlo, and numerical gradient. Global optima for parameters are identified using Simulated Annealing (SA). Though SA requires relatively more time, it determines more accurate solutions compared to other techniques. The present study uses SA for identifying optimal SVM parameter values. The present study suggests an SA-augmented SVM to assess the effectiveness association with every C value that corresponds to iterations conducted using parameter T. A novel technique has been proposed for minimising T so that a gradual convergence towards global optimality can be attained. Parameter T is multiplied by value V, which lies in the [0,1] range. The SA algorithm relies on the initial and subsequently probability state. The critical decision concerns retaining the present state or moving to the subsequent state (Cj) based on value comparisons. If the new state (Cj) has a fitness value greater than the present state, it is accepted. On the other hand, if the new state has relatively less fitness value, its acceptance is associated with a probability value based on the problem objective. Such probabilistic transitions gradually move the system to a close-to-optimal state. Individual iterations comprise random neighbour selection. If the chosen neighbour has accuracy higher than the present state, the neighbour is selected as the present state, and the parameters (C) are updated accordingly. SA accepts solutions (Cj) considering probability value P that is influenced by T. Figure 1. depicts the working of the SA technique.  The primary objective is to provide the algorithm additional time to evaluate the search space using the "temperature" parameter. In high-temperature cases, the algorithm works like a random search where all transitions are accepted, notwithstanding their effect, facilitating better search. The proposed technique relies on SA to find the C factor value for the SVM classifier as well as for its training set. The search space is relatively better because of the use of a single parameter, C.

Support Vector Machine
Support vector machines (SVM) were initially proposed by [1] and [19] using the Vapnik-Chervonenkis (VC) framework augmented using structural risk minimisation (SRM) (Vapnik, 1995(Vapnik, , 1998. The framework works by identifying the ideal balance between training set error reduction and margin maximisation. The model has optimal generalisation capability while also remaining free from overfitting. Another benefit of using SVM is convex quadratic programming, where the local minima trap is avoided because only global minima are reported.Consider a training set comprising input and output, where the input contains feature extraction aspects ( 1 , 2 , 3,…, ) along with the output outcomes classes {( 1, 2, 3, … , ), ( , )}, where ∈ and ∈ { − 1, + 1}. The precise value of (y) can be computed using a set of weights wi. The present study uses margin maximisation to ascertain the hyperplane.
contains vector weights, while ( ) denotes the features sets corresponding to the classes, denotes the dual function when the training condition 0 ≤ λ ≤ C is satisfied, denotes the training set, denotes the output classes, H denotes SVM hyperplanes, g(x) denotes feature count, denotes bias that belongs to omega 0. Originally, support vector machines (SVMs) were formulated for binary classification. The question of how to successfully extend it for multiclass grouping is still being researched. Multiclass SVM uses two different approaches. The first involves building and integrating multiple binary classifiers, while the second involves explicitly considering all data in a single optimisation formulation. Multiclass problems can be split into several binary problems, each of which can be addressed independently. Pseudocode for proposed method

Evaluation metric
Considering the unique characteristics of class-imbalanced problems, the traditional classifier specifically classifies minority class samples as majority and achieves a higher global precision score. However, the accuracy rating for minority-class samples is poor. The proposed strategy entails equating the class and deleting duplicates to work for class equalisation and eliminating duplicates that interfere with the classification process. For substantiating the efficacy and usefulness of the recommended approach, precision, recall, and F-score are utilised for evaluating the class-imbalanced classification which utilises the confusion matrix, as indicated by formulas (3) and (4).
Precision and recall fail to offer significant information individually; and hence they should be utilised together. Essentially, the precision-recall breakeven point (BEP) is an assimilated efficacy indicator which is utilised coupled with the F1 measure. The below-mentioned expression for F1 is the harmonic mean of precision (P) and recall (R): 1 = 2 ( + ) ⁄ (4) Precision-recall BEP is that point where recall and precision are in equation. It is typically calculated by contrasting precision, and the arithmetic mean of recall. It is essential that the BEP performance indicator be computed separately for each class. The mutual efficacy of an approach, as gauged by taking into account all classes, could be ascertained by utilising the micro-average or macro-average BEP. In case of macro average, all groups are assigned equal weight. Conversely, the micro-average method renders similar weight to each document.

Experiment results and assessment
In order to assess the efficacy of our proposed procedure, we employed our methodology in a python, with a platform comprising Intel® Core™ i7 CPU @ 7GHz and 8.00 GB RAM along with Windows 10. For measuring the suggested approach's performance, several benchmarks from the UCI machine learning repository were chosen, and the stated three sets of experiments were carried out. It is to be noted that we have divided the data as 70% for training and 30% for testing. This has been deployed for all original datasets for assessing the precision of every class, by utilising methods like precision, F-measure, overall accuracy, and recall for these original data sets. Table 3 shows the precision of the original data set for each class. Through imbalanced data sets, the evaluation metrics of the classifier's performance are separately obtained with regards to the data distribution, ensuring that the apt metric is chosen for this type of issues. Table 3 depicts the average values for precision, wherein SVM is utilised for every original dataset for carrying out the average test of precision so as to warrant a pertinent statistical behaviour. This table shows the experiment outcomes for the original SVM minus execution of the recommended approach, wherein underperformance is seen in the outcomes for few of the datasets which pertain to the regular imbalanced assessment metrics. The experimental outcomes presented above indicate that learning without the suggested technique leads to lower performance specific to the assessment indicators for regular imbalanced data. Also, the SVM technique achieved poor recall and precision for all datasets tested during the study.
To assess the efficacy of the suggested technique, the selected imbalanced dataset was experimented upon. Imbalanced data classification accuracy required computing the optimal value of the regulating parameter. The present study uses SA for parameter optimisation. Table 4 shows the particulars of the SA parameter. In terms of imbalanced datasets, and to warrant the legitimacy of our recommended model, we have used it to determine the precision of every class, where a noteworthy enhancement with regards to estimation for every class and high precision was seen for every class as against when SVM was applied on the same data set. SVM exhibits a variable average precision for every class on the majority of the data sets. We observed an enhancement in classification precision, thus making the classifier competent enough to classify better compared to SVM. as can be seen in Table 5. The parameters cast a pronounced effect on efficacy and competence of SVM. Parameter adjustment not just pertains to the under fitting and over fitting of the training data but also has an impact on the outcomes of the validation sets. The best parameter pair (C) is then utilised for generating the model for training. Perform the prediction test on every test set; subsequently, the corresponding training set is expected to produce another optimal parameter value.

Conclusion
Viewed as amid the most prevalent machine learning algorithms, support vector machines (SVM) are extensively deployed for object recognition and dataset classification. Nevertheless, this technique has a problem-specific parameter and ascertaining its optimal value is a difficult task; this impacts the precision of the algorithm. In the present study, we suggested using an SA-augmented SVM algorithm. Hybridisation facilitates better results on SVM parameter optimisation. Experimental outcomes on UCI datasets with diverse sizes have shown that the algorithm has superior precisions as against the SVM algorithm whereas the time consumed for computation is judicious as well.