Mathematical simulation of memristive for classification in machine learning

Over the last few years, neuromorphic computation has been a widely researched topic. One of the neuromorphic computation elements is the memristor. The memristor is a high density, analogue memory storage, and compliance with Ohm's law for minor potential changes. Memristive behaviour imitates synaptic behaviour. It is a nanotechnology that can reduce power consumption, improve synaptic modeling, and reduce data transmission processes. The purpose of this paper is to investigate a customized mathematical model for machine learning algorithms. This model uses a computing paradigm that differs from standard Von-Neumann architectures, and it has the potential to reduce power consumption and increasing performance while doing specialized jobs when compared to regular computers. Classification is one of the most interesting fields in machine learning to classify features patterns by using a specific algorithm. In this study, a classifier based memristive is used with an adaptive spike encoder for input data. We run this algorithm based on Anti-Hebbian and Hebbian learning rules. These investigations employed two of datasets, including breast cancer Wisconsin and Gaussian mixture model datasets. The results indicate that the performance of our algorithm that has been used based on memristive is reasonably close to the optimal solution.


Introduction
Memristors is the fourth component of the expected principal behaviour of the passive device along with resistor, capacitor and inductor [1][2]. It is firstly introduced in 1971 by Chua [1], to reflect the relationship of resistance with memory. A varying resistance is used for storing data in different states [3][4][5]. Memristive devices are currently being developed to use, as a functional block of multilevel storage, in many common applications, such as neuromorphic computing networks [6][7][8][9] and functional logic devices [10][11][12][13]. In neuromorphic networks and system, the memristive material is widely used as a bio-material in many applications of the neuromorphic systems to provide both memory and processing functions [14][15][16]. The purpose for that it is being used as a resistive material with the memory function of electric charge. In other words, the amount of the electric charge at any time provides the value of its passing current and then it can be defined as, [14] and [17]: Where f is the magnetic flux, q is the electric charge, i is the current and t is a given time.
When the current flows through the device in one direction, the resistance of the material can rise up, or it might decrease if the current flows in the other direction. This helps to implement a physical neural network with one or more memristors-like synapse to connect neuron nodes together for forming physical multi-layer in a network. Each node within a given layer connects one or more inputs with one or more outputs via synaptic weights which modified by threshold values associated with their varying resistances between pre/post-synapses neuron signals as illustrated in Figure 1. The basic mechanism of synaptic plasticity is described by Hebbian scientific theory.
Anti-Hebbian and Hebbian may be used to describe the particular adaptation of learning rule by which synaptic plasticity can be self-adapted [18]. In order to use machine learning more efficiently, the input data is represented here as a spike (bundle wires) with each input spike corresponding to a wire. Depending on the input data representation, some of the input spikes to the classifier are active (1) and others are not active (0). Although spike encoding is not necessary for the classifier, it can become efficient in several points. Firstly, in the case of chip design, it is difficult to move analogue voltages. Secondly, if the number of active line synapses is low, then the computation will become efficient because only active line synapses are adapted. Thirdly, it is a digital code and it is easy to interpret what these digits mean. The remainder of the paper is outlined as follows: the following section describes the essential background and related studies. The next section demonstrates the design methodology of using memristive circuits in AHaH classification. Following that, Spike encoding description as an input data encoder is described by a case study.

Background
Resistive switching devices such as phase-change memories and memristors are all been introduced as an alternative technology to traditional CMOS and TTL devices that have typical hardware implementations based on von Neumann architecture [10], [19] and [20]. The resistive devices have the ability to produce different levels of non-volatile resistance for determining their memory states. Therefore, such resistive devices, like memristors, have potential concepts to use in neuromorphic (brain-like) computing technologies for producing the functions of synapses and neurons [7], [21]- [26], i.e. building non-volatile connections between neurons. Despite the fact that the memristive devices have been studies over many years with some limitations, a real establishing was discovered in 2008 [8]. Next, crossbar arrays of memristors were introduced and then developed as building blocks of memory circuits [27]- [34]. Then, memristors were trained to use as spike-like circuits [14][15][16][17][18][19][20][21][22][23][24][25][26][27]. After this, Learning and memory were thought to begin with artificial synaptic plasticity which provided excellent dependability, scalability, and low energy consumption, similar to natural synapses. Figure 1. A pre-description of a perceptron network, the postsynaptic neuron is an output summation of the k synapses connected by k input Nugent and Timothy [18] published a basic memristive device model based on Anti Hebbian and Hebbian (AHaH) learning rules in 2014, in which synaptic weights are represented by two memristors linked serially with the same polarities. To mimic Anti Hebbian and Hebbian (AHaH) circuit nodes, these memristors are required. Although there are numerous memristive device models that model specific devices, this sort of memristor simulated a huge number of them [18].

Method
A memristive can be construct synapse which used in classification to adapt the values of the weights and the adaption property leads to reduce the communication between the memory and processing units, thus the power consumption is reduced. Nugent and Timothy [18] derive equations to update weight and these equations are derived from thermodynamic law to form a function model as illustrated in the following equations.
where w represents a synapse. α and β represent read and write periods respectively. ƞ is the noise variable (normal Gaussian) and δ is a decay variable.
Equation (2) is used in the case of unsupervised feedback while in the case of supervised the following equation is used: But in the first, initialize weight for each of the active input and the bias.
Then find the output (y) by the summation of the weight for all the active input and the bias.
After that, the weights are modified by applying Equations (2) and (4) and find new weights.

An Adaptive Spike encoding (Input data encoding)
In order to use Nugent and Timothy [18] classifier in machine learning, the input data to this classifier is represented as a spike (bundle wires). Each input spike represents a wire. Some of the input spikes to the classifier are active (1) and the remaining is not active (0), depending on the input data representation. There is number of steps necessary to convert input data into spikes, see Figure 2.
Spike representation (digital representation) for the input data [35].
Spike pattern represents indexes for active spike input to the classifier. Spike space is the maximum index which can be used to encode the input data. Spike encoder converts the input data into spike encode.
Spike encoder: Converts input data into spike code by using a binary tree with anti-Hebbian learning. It is adaptive and the spike encoding will change over time. One encoder is created for each input feature. Firstly, recursion is used to divide the tree into the high and low bin, and a unique number is assigned for each node in the tree, as illustrated in Figure 3, where the number of nodes depends on the number of depths.
Secondly, the effect of the bias is computed on the input by the following equation: Where, x is the input feature data, and b is the bias. The adaptive average subtraction is Anti-Hebbian learning, and that is why this particular encoder is used. The bias adapts its value depending on the input feature value, which has an effect on the next input pattern for the same feature.
Where, lr is the learning rate.
Finally, the routing path for the input feature is determined depending on y value, if it is negative, then it goes left and if it is positive then goes right and adds 1: the process is repeated recursively until a null is reached. As a result, the path consists of the nodes which pass from the root to the leave of the tree, and this path represents a spike encode.
Spike converter: It gives a unique index for each node in the path for each feature until the maximum index is reached which represents the maximum number of the input spike for the classifier.
Where s is the maximum input spikes, f is the number of features and n is the number of nodes in the tree. Spike pattern: It is the indices for the active spike for the classifier.
Where p is the spike pattern space, f is the number of features and node p is the number of nodes within the routing path. For example, if the number of features is 2 and the path for each feature is represented by 3 nodes, then the spike pattern space is 7 because the value for the bias is 1 for each pattern.
Because an adaptive method is used, the same input will produce slightly different spike outputs over time as the encoder adapts to the signal. If its adaptation is affecting the classification, then the learning rate is decreased and the number of training epochs is increased.

Performance measures
Five metrics were used to assess the performance of our method. Accuracy, sensitivity, specificity, precision, and F-measure are examples of these metrics. They are among the most common measures used to evaluate machine learning performance. They are calculated using the four numbers true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The TP and TN stand for the number of situations that are properly categorized as positive and negative classes respectively. The FP is the number of situations in the negative class that were erroneously categorized as positive class, as well as the FN is the number of situations that are erroneously categorized as negative and those that are in a positive class.
The proportion of correctly categorized situations is referred to as classification accuracy, and it is computed as follows: The percentage of all positive samples that are correctly identified is referred to as sensitivity, and it is computed as follows: The percentage of negative samples that are accurately recognized as negative is measured by specificity (not having the breast cancer). The following formula is used to compute it: Precision is sometimes called the positive predictive value (PPV). It's the proportion of samples properly identified as positive among those that have been classified, and it's computed like this: F-measure is sometimes referred to as F-score. It has a range of 0 to 1 and expresses the balance between accuracy and sensitivity. There are no true positives with a value of 0, and there are no false negatives or positives with a value of 1. The following formula may be used to determine the F-measure:

Experiment results
To illustrate how classifier rule derives from the memristive simulation is work in the machine learning applications, a breast cancer Wisconsin and Gaussian mixture model data set are used here. The classifier consists of two parts: training part and test part. In this case, a number of synapses are connected to form classifier and the number of synapses is determined by the number of input spikes and bias from Equation (10). The order of spikes in a spike pattern should not matter. It is a "set" and is used to co-active respective synapses. The adaptive spike encoding will change over time to track the statistics of the data. It is a decision tree that is performing an (exponential running) average subtraction at each node. Over time, it will adapt to the signal. It is likely that the same input will produce slightly different spike outputs over time as the encoder adapts to the signal. If its adaptation is affecting the classification, then the learning rate would lower and increase the number of training epochs.
Breast Cancer Wisconsin: breast cancer Wisconsin is used to classify if the cell is benign or malignant [18]. In this data set, there are 9 features for this application as illustrated in the following list:  10) where f=9 and n=7 for the tree with depth=3 and 1 bias for each feature). The range values for all the above features are between 1 and 10. Here one of the input training patterns is 327 (10, 3, 3, 1, 2, 10, 7, 6, and 1) for the above features respectively, is converted into the following code by spike encoder: 34, 0, 35, 1, 32, 33, 3, 38, 4, 39, 6, 8, 42, 10, 41, 11, 46, 13, 14, 15, 45, 18, 21, 20, 23, 25, 27and 26. These numbers of the input lines are active for the classifier. In the test part, also the input is converted into spike and 183 patterns are tested. Here unsupervised signal (0) is used because there is no supervised signal, then the output y is computed (Equation 6). In the training part, 500 patterns are trained and each pattern has 9 features of the cell. The class of the cell is either 2 (benign) or 4 (malignant. Then, the output y is computed (Equation 6). Finally, a threshold is used to determine the output class, either class 2 or class 4. We get 95.14%, 95.5%, 86%, 98%, 0.96 for the accuracy, sensitivity, specificity, precision and F-measure respectively. When the threshold is increased, the precision is increased while sensitivity will drop. Gaussian mixture model data set: Gmm is used here because there are two advantages. Firstly, the input data can be represented by two-dimensional pictures. Secondly, the bayes error rate and the decision boundary are known. In this study, 250 patterns are used for the training process and then used 50 *60 patterns for the test process. The range of x-Axis between -1.5, 1.5 and the range of y between -0.2, 1.6 and test data is 1000. Figure 4 shows that the result of the classification output for 1000 test patterns and also there are 250 patterns for training part. The red and blue colours represent class 0 and class 1 respectively. The true positive is 0.946 and the threshold is 0.2. The most important properties of the function simulation, it is less computation than the circuit simulation and both of them give converges in the simulation results.

Conclusions
In this paper, we presented a classifier based on minimum memristive devices for general purpose computing and machine learning through the implementation of neuromorphic learning rules based on memristive nanodevices. The memristor works with changing resistance depends on the current ow through it. The synapse weights have been built from two memristors, and an array of these synapses is used to build the classifier, which could be used for machine learning applications with minimal computational overheads and cost, and give efficient results in classification. Also, using spike encoder to convert the input data into spike encoding leads to more efficiency in the applications of the machine learning. There are different applications of machine learning; for example, supervised and unsupervised classification and clustering based on a memristive nanodevice, which could be used in the storage and the computations.