Comparison and analysis of supervised machine learning algorithms

When investigating a network for signs of infiltration, intrusion detection is used. An intrusion detection system is designed to prevent unwanted access to the system. Data mining techniques have been employed by a number of researchers to detect infiltrations in this field. Based on distance measurements, this study proposes algorithms for supervised machine learning. In terms of detection rate, accuracy, false alarm rate, and Matthews correlation coefficient, supervised machine learning techniques surpass other algorithms. When it comes to serial execution time, the supervised machine learning algorithms surpassed all other Actions in terms of serial execution performance.


Introduction
More than two quintillion data bytes have produced and traded every day. Conventional detection systems are unable to sense the intrusion in a way that is compatible with high data volume and speed. Big data technologies are being used to deal with this threat efficiently. The use of supervised machine learning algorithms can be a special device or software system that spontaneously screens and identifies attacks or infiltrations and issues alerts to the computer or network. It is a system that monitors the traffic of data within the network with the intention of detecting any suspicious activity or any potential threats [1,2]. The alert report helps administrators or users to identify security holes in the system or network and thus solve them. Intrusion detection technologies such as host and network detection are the subject of anomalous approaches to data analysis. Network Intrusion Sensor can detect network traffic and control multiple network hosts to identify any errors. SVM can be used as an enhanced, non-programmable machine learning technology that demonstrates how different SVM, LR, LDA, RF and CART algorithms can be implemented [2], [3,4] such as modified logistic regression, decision tree address, artificial network and use of machine learning. Unusual behaviours may be detected using software. And network traffic equipment.

Intrusion detection ML algorithm
ML refers to (Machine Learning) that stands for Artificial Intelligence (AI) branch. Where machine learning allows the system to learn and predict and improves its automatic ability to experiment without being programmed in detail depending on a set of algorithms. In addition, ML algorithms work more precisely in sensing attacks of an enormous quantity of data in the shortest time possible [5][6][7][8]. And ML algorithms have been categorized into three classes: • Supervised • Unsupervised • Semi-supervised

Supervised machine learning algorithms
The monitoring algorithm deals with completely classified information and identifies the connection among data and its class [9]. This can be achieved by either regression or classification. There are dual stages in the classification: preparation and assessment. The training data was carried out using the reaction vector. Supporting Vector Machine (SVM), discrimination, Nauve Bayes, Nearest Neighbor's Network and logistic regression are the usual algorithms under classification group [10,11]. Such algorithms are Linear Regression SVR Ensemble Methods, Decision Tree is shown in Figure 1, and Random Forest while other algorithms are Linear Regression. In this article, support logistical regression of vector machines The discussion is on the Linear Discriminant Analysis SVM, LR, LDA, RF and CART. 3) data training consisting of classified ordinary and anomalous events; 4) data training consisting of several labeled incident groups. Single classification formulations are Methods 1 and 2 Method 3 has a difficulty with two-class classification, and Method 4 has a multi-class classification. Note that Methods 3 and 4 have here shown that training data must be allocated to labels [12]. This distinguishes between unattended methods of learning that don't need marks to be attached to the training results. This statement is often made in order to differentiate between methods 1 and 2, where all training information is collected by a single class [13], and the marking is therefore negligible.

Support vector machine (SVM)
SVM is one of the most popular machine learning (ML) algorithms out there. Regression and classification can both be accomplished with SVM. The algorithm may be trained with labeled data and the hyper plane can be used to divide the data into classes, maximizing the range of all the attacking classes [14,15]. Cascaded multiclass classification can be achieved using SVM, according to Mehmood et al. [16]. In Figure 2, the types and parameters of the kernels utilized in SVM are clearly illustrated. The AE model employs an unsupervised learning method with one or more hidden layers. Trait learning and dimensional reduction can benefit from nonlinear generalization of trait learning [17]. Units of input and output are the same in AE, and the function vector parts are also equal in number. There are exactly as many units in the hidden layer as there were in the bottleneck layer, which was determined before to training. A vector-based machine learning approach, SVM is generally a supervised model [18]. Two classes and two people are shown to be making use of categorization learning methods in Figure 1. You can categorize a new text because each classification on the SVM system has its own unique data set number [19].

Logistic regression (LR)
Logistic regression is named for the function used at the core of the method, the logistic function. It's an Sshaped curve that can take any real-valued number and map it into a value between 0 and 1. It was developed by statisticians to describe properties of population growth in ecology, rising quickly and maxing out is shown in Figure 3.
It is another way of borrowing from the mathematical data profession through machine learning. It is also the aim of the process of binary classification issues (difficulties with more than just two class moral values). Logistical regression is used to form a group results such as real passes/completely failures, optimistic and constructive/no or neutral again, and then we use the probability distribution class as fraudulent and not fraud in case of credit card fraud identification [20]. Linear models are intended for regressions in which a linear combination of the input variables is expected to be a target value. LR is a linear classification instead of regression model, despite its name. The probabilities that describe the potential result of a single test are modeled with a logistic function in this model. Scikit-learn is using the Coordinate Descent (CD) algorithm [21] Logistic Regression, the default resolver of which is. Logistic regression is often referred to as the regression of Binomial logistics. The sigmoid function is dependent on which the output is likely and the input will range from -infinity -+infinity. Let us address some Linear Regression benefits and drawbacks.

Linear discriminant analysis (LDA)
For dimensionality reduction and prediction, LDA is a linear supervised linear ML technique. Bayesian inference is used to determine the likelihood that a new input belongs to a particular class.
Data sets and test vectors can be analyzed using two alternative approaches in the converted space. Based on class, a person is transformed. The ratio of class variance to class variance is maximized in this strategy. In order to obtain a high enough level of class separability, it is critical that this ratio be made even better. Data sets must be transformed independently using two optimization parameters in the class approach. Classindependent transformation: This solution aims to minimize the gap between total differences and the differences between classes. Figure 4 illustrates a strategy that applies only one optimization criterion for changing data sets and consequently discards all data points.

Figure 4. Linear discriminant analysis algorithm
The LDA technique's aim is to project a lower dimension of the original data matrix. Three steps needed to be taken to accomplish this objective. The first step is to quantify the segregation between groups, which is called the interclass or interclass matrix. The second step is to measure the distance between the mean and the samples of a class known as the internal or internal variance [22]. The third step is to design the lower dimensional space that maximizes the variance within classes and reduces the variance within class. These three are explained in this segment. Linear Discriminant Analysis (LDA) and Variable Inference in Near Real Time are used to track deviations in internet traffic [23]. A technical solution, which uses Natural Language Processing (NLP) methodology to identify potential malicious attacks and network configuration issues, is explained, and results are presented demonstrating the implementation of the concept. There are potential use cases for this technology in the areas of anomalous data detection

Classification and regression tree (CART)
No online supervision is required for CART, which uses a simple ML algorithm for classification. When using CART, the target variable should be categorical, however when using regression trees, the target variable should be continuous. In CART, the Gini index is a measure used to describe the data.

Figure 5. Classification and Regression Tree algorithm
Being able to apply classification and regression techniques simultaneously is a significant benefit of this methodology. As a result, a binary tree is formed, with each internal node having two outbound edges. Cost Complexity Pruning and IG, GI and twoing parameters can all be used in the splitting process. CART is an algorithm that we employed in our work with the scikit-learn library [24].

Random forest (RF)
RF stands for a dynamic non-linear algorithm that is employed for regression and classification. This will build several decision-making bodies for model education, with the results of predictions collected from all the trees producing a response, as Ensemble techniques are mentioned. The RF classification system operates the following: the more trees the model contains, the better the precision and the more the model is not over fitted is shown in Figure 6. Classifier is a classification ensemble that integrates multiple classifiers of decision-making-trees to forecast the class [25]. Each tree is sampled individually and uniformly using the majority rule. Every new input data point is forwarded to each of its trees by the RF classifier to select the class class classified by the most trees. One of the ensemble classifier approaches is Random Forest. If an ensemble classifier is a decision-making classifier, the classifier set is a 'land.' The random collection of attributes for each separation node [26] is used to build each decision tree. In 2001, Breich suggested the random forest algorithm. Study performed by [27] included several anomaly detection experiments using random forests.

Ensemble methods
This ML technique integrates many simulations to create the desired predictive model. The core concept behind ensemble approach is to group all weak students into a strong learner, thus increasing the model's accuracy. Bagging, boosting and stacking are some typical forms of ensemble approaches. In approach to this method, Gautam et al . [7] have developed a path with ML algorithms , in the recent new papers that use machine learning algorithm which used for anomaly detection the algorithm partial decision. It showed that the approach of the ensemble is higher than SVM, LR, LDA, RF and CART algorithms.

Performance evaluation
All the preprocessing strategies have been validated with Supervised learning algorithms and we present in Table 1 the findings of the best methods of All SVM ,LR,LDA, RF and CART algorithm. The below table is shown the advantages of each algorithm and the drawback with performance analysis for each machine learning algorithm to compare between them. Table 1. Supervised learning algorithms analysis The anticipated work in this paper can be developed more and more employing genetic algorithm [28], internet of thing (IoT) [29], cloud computing and Arduino as viable future trends [30,31].

Conclusion
Anomalous data detection algorithms are classified into two types in terms of detecting misuse and defects. As we adopted in the detection of anomalies, the generation of predictive patterns, sequence matching, statistics, and supervision. After we analysis the methodologies which has been used for anomaly detection, we found that Support Vector Machine algorithm is the preferred one and the high accuracy in anomaly detection.

Methodology
Employed Dataset Advantage Limitation Performance Analysis [16] Comparison of different moderated algorithms for a deviation-based detection technique (Svm).

NSL-KDD
Has high detection rate The training and testing speed is slow It has cannot detect novel attacks [17] The Fuzzy clustering and Svm used for classification

KDD CUP
The process was carried out by dividing the heterogeneous training group into subgroups Showed vulnerability to handling complex data in a large data set Higher IDS detection accuracy, fewer attacks, and stronger detection stability achieved. [18] Most strategies were implemented throughout the identification of card fraud.by using (svm)

Cc data set
It achieves its own feedback process by enhancing classifier detection rate as well as effectiveness.
The method was successful with cc dataset This model was very accurate, with a false alarm rate of 1.87%. [21] Show better results in long distances vs. Attacks

KDD--99
Better results appear in long distances versus during attacks.
Because of its strong rules, it is good at detecting anomalous data Good at detecting anomalies [20] Anomaly detection techniques based on a single classification KDD'99 Mechanisms for detecting anomalies within a single classifier.
Force of class stability versus lack of data To improve the performance of intrusion detection systems. [24] Make the neural network classification work as effectively as possible KDD 99 This system provides less time for online learning.
This system was characterized by an increased false alarm rate.
The quality of the rating and the error rates were used as a parameter to evaluate the performance