Real-time visual and EMG signals recognition to control dexterous prosthetic hand based on deep learning and machine learning

The revolution in prosthetic hands allows the evolution of a new generation of prostheses that increase artificial intelligence to control an adept hand. A suitable gripping and grasping action for different shapes of the objects is currently a challenging task of prosthetic hand design. The most artificial hands are based on electromyography signals. A novel approach has been proposed in this work using deep learning classification method for assorting items into seven gripping patterns based on EMG and image recognition. Hence, this approach conducting two scenarios; The first scenario is recording the EMG signals for five healthy participants for the basic hand movement (cylindrical, tip, spherical, lateral, palmar, and hook). Then three time-domain (standard deviation, mean absolute value, and the principal component analysis) are used to extract the EMG signal features. After that, the SVM is used to find the proper classes and achieve an accuracy that reaches 89%. The second scenario is collecting the 723 RGB images for 24 items and sorting them into seven classes, i.e., cylindrical, tip, spherical, lateral, palmar, hook, and full hand. The GoogLeNet algorithm is used for training based on 144 layers; these layers include the convolutional layers, ReLU activation layers, max-pooling layers, drop-out layers, and a softmax layer. The GoogLeNet achieves high training accuracy reaches 99%. Finally, the system is tested, and the experiments showed that the proposed visual hand based on the myoelectric control method (Vision-EMG) could significantly give recognition accuracy reaches 95%.


Introduction
Physical therapy plays a significant role in the rehabilitation stage for the people who suffer from an amputee. Thus, many efforts are made to elucidate the leverage of medical/ clinical and human-machine interface (HMI) applications [1]. Therefore, robotic has the effectiveness to increase the independence of the individuals living lifestyle with their disabilities. This work aims to improve life quality by empowering people to achieve a wide range of daily responsibilities within few minutes. The most widely used are robotic hands and arms because the robotic hands would have the ability to achieve the primary skills, like grasping and transferring objects from one place to another, similar to what non-amputees individuals do [2]. At present, designing a prosthetic hand-based myoelectric control plays a dominating role in the re-habitation side. This requires recording for surface electromyography (EMG) signals; this technique follows up electrical activity associated with skeletal muscles [3]. The muscles contain some motor unit action potentials (MUAPs) to record grasping and clipping actions [4]. The combination of muscle fiber action potentials from all the muscle fibers of a single motor unit is called the MUAP [5].
During clipping or grasping; the tactical information, electromyography information and visual information are involved, where the visual part occupies 83% of all information [6]. The visual feedback helps the people compare the size of the hand and the target object. In that way, adjusting occurs when the grasping gestures and the aperture size match the target object. For this reason, the visual information is achievable to obtain the necessary control parameters for the artificial prosthetic hand [7]. Also, the technology of combining more than one information source will provide an efficient way for the control of smart prostheses [8], [9]. For increasing human-machine interaction, the visual control system is used to fill the gaps of designing prostheses based on EMG signals. The artificial vision system based on image recognition can obtain the features of the object, for example, the shape and the size of the required item [10]. After collecting the information, EMG or visual information, the artificial intelligence (AI) is used to extract the main feature and classify it using the Machine Learning algorithms (ML) or Deep Learning (DL) algorithms [11]. ML is a learning algorithm to train random data in an intelligent way [12]. The learning algorithms are classified into supervised and unsupervised. Supervised learning occurs when a group of known data and the algorithm is trained for a specific function. Meanwhile, unsupervised learning occurs when the inputs are given, and the algorithm learns to find features or patterns to produce the output [13]. Meanwhile, DL is the trending algorithm due to the algorithm structure that contains a deep network (multiple hidden layers) that learns different features with multiple levels [14]. DL problem is summarized in the hierarchy of concept, where each concept is constructed on top of the others. Therefore, the lower layers are considered as the primary representation of the problem [15]. Besides, the consecutive layers can be learned via sub-models that are organized in layers stacked. The main problem in the ML is the feature extraction; this problem has been solved in DL due to DL's capability to learn the useful features by itself. Utilized and unsupervised DL models have been grown so fast due to their achievement in solving complex problems [16]. The novelty of this work is satisfied by sorting the RGB images of the objects into the six basic hand movements using deep learning (GoogLeNet) to improve the operability of the prosthetic hand by designing a pattern recognition for a prosthetic hand by using the Myo armband, also supporting the prosthetic hand by the visual recognition to distinguish the objects. The Myo armband is used to collect the EMG signal; then, the signal is classified into the six basic hand movements (cylindrical, tip, spherical, lateral, palmar, and hook) using machine learning (SVM).

Methodology
This research works on a novel idea of designing a prosthetic hand for an amputee's person based on an electromyography (EMG) sensor and camera. Therefore, this work is divided into two parts; the EMG signals part and the image recognition part. In the EMG part, the first step is collecting the EMG signals from the elbow muscles for six basic hand movements, i.e., cylindrical, spherical, tip, palmar, hook, and lateral. The EMG data are collected from more than six participants of different gender and ages by using the Myo armband sensor. Then, simplify the raw collecting EMG signals by extracting time-domain features to prepare the extracted signals for the classifications stage. The final step will be choosing suitable classifiers from the Machine Learning (ML) algorithm (Support Vector Machine) groups to classify the EMG signals for different movements. In The image recognition part, the image data are collected for many different shapes to prepare the right environment for the recognition step. Then, process all the collected images by making all the images in the same resolution and size with an empty background. Sequentially, the features are extracted from the images then the images are classified using Deep Learning (DL) algorithm (GoogLeNet) into cylindrical, spherical, tip, palmar, hook, and lateral. The final step is adding a camera in the palm of the prosthetic hand. The camera is used to recognize the shapes of the objects that appear in front of the camera. The camera is working online in real-time while moving the hand and objects to test system ability and accuracy to recognize 24 items.

Data collection and processing
Designing a prosthetic hand, as proposed in this work, will be depended on collecting both EMG signals from the elbow and image data for different items. For this reason, the data are divided into two parts; EMG data and image data.
• Recording six basic hand movements, i.e., cylindrical, spherical, hook, palmar, tip, and lateral) for two males and three females. Eight sensors of the Myo armband are corresponding to eight inputs in the time domain, and the amplitude of the signal represents the voltage of the required muscles; and each time series consists of 3000 samples, the eight sensors cover the elbow. As a notice, fig.1 shows the raw EMG signals for basic hand movements; the amplitude of the EMG signals represents muscle contraction for high voltage and muscle relaxation for low voltage. In this part, the obtained EMG signals have low amplitude due to the low tension on muscles responsible for finger movements. • All the world items can be held by the human hand based on six hand movements, i.e., cylindrical, spherical, hook, palmar, tip, and lateral. This research captures 723 pictures for 24 items from different capturing corners, background, and distance from the camera. These items are classified into seven categories according to how the prosthetic hand will hold the items, i.e., basic hand movements. Table.1 clarifies the categories containing the items and the number of pictures captured for each item.
For the data processing part, this work is made simple processing for the collected data. When using the Myo Armband sensor, the level of the noise is considered very low. Therefore, no need to use filtering in the EMG signals for noise reduction. In the image data, all the image is resized to be 224*224 pixels, and the background is deleted. The processing step is important for preparing the data for feature extraction, either using machine learning or deep learning.

Feature extraction
After the data processing step, for construction pattern recognition (PR) system with considerable scope to transform the raw data into a feature vector or another suitable representation, to avoid time-consuming and extensive data. The feature vectors are given the ability to the learning system, i.e., classifiers, to detect the correct pattern [15]. Therefore, transforming the raw data to feature vectors requires time-domain analysis (TDA) or frequency domain analysis (FDA). In this work, for extracting features for EMG signals, the TDA is used instead of FDA because TDA is proper to use with EMG signals [17]. Thus, three TDA are used to analyze EMG signals in offline mode using Matlab 2019a. The recoded signals are then segmented in a window size of 200 ms and an increment of 150 ms. In this work, the mean absolute value (MAV) and the Standard deviation (SD) are used to extract the features from the raw EMG signals and simplify them by using the following mathematical models [18]; (1) Where; x: input EMG signal, and N: the length of the input signals.
After that, the Principal Component Analysis (PCA) is used for dimension reduction of the EMG signals because of the reduction in time and space complexities. The dimensions of the new components will be uncorrelated and orthogonal to each other. The reduction happens when PCA selects the maximum variance and forms new directions.

Learning algorithms
After that, the Principal Component Analysis (PCA) is used for dimension reduction of the EMG signals because of the reduction in time and space complexities. after extracting the features from the raw data, the learning algorithm is chosen to classify the data according to the required categories. In this work, two learning algorithms are used; Machine Learning and Deep Learning. The difference between these two algorithms is the ML deals with the features extraction first, then uses the extracted feature to fed it to the classifiers. In comparison, deep learning takes the features extraction and the classifiers as one package to deals with.

Support vector machine (SVM)
ML contains many algorithms that are based on fixed logic; one of these algorithms is Support vector machines (SVM) that is used in this work. SVM is a powerful supervised machine learning technique that is used for regression or classification issues. The SVM finds the optimum separating hyperplane in data classification by building hyperplane or hyperplane groups with infinite dimensionality space [19].
The SVM can deal with linearly separable data and not linearly separable. This work deals with not linearly separable because it maps the data to a higher dimension and uses the radial basis function (RBF) to obtain a high classification rate. Therefore, the RBF is considered as the kernel function to generate non-linear classifiers [20]. Also, SVM uses a subgroup of training (support vectors), and diverse Kernel functions can be particular for choosing the support vectors. The classifier used in medical diagnosis, also SVM, is useful for low memory [21,22]. The cubic SVM is a type of SVM, and the equation of cubic SVM can be given as (3):e dimensions of the new components will be uncorrelated and orthogonal to each other. The reduction happens when PCA selects the maximum variance and forms new directions.
( , ) = ( , ) 3 (3) The cubic SVM is preferred because the short time required form training in this work training time ranges from 3 sec to 12 sec.

GoogLeNet
Convolutional Neural Network (CNN) is the trendiest applied supervised Deep Learning used in this work. The CNN has obtained the features from input at the higher layers then combine them for more complex features at the lower layers [23] [24]. GoogLeNet is a convolution architecture that has 22 layers that use an inception module to help in reduction parameters in the network [25]. Also, GoogLeNet is a concatenated layer of convolutions (3 × 3 and 5 × 5 convolutions) and sub-layers for pooling process at different scales, the output of the pooling layer feeds to the filter banks to concatenate into a single output vector, then making this vector the input for the succeeding stage [26]. The sub-layers are connected in parallel as shown in Fig. 2, the GoogLeNet model has two convolutional layers, nine inception layers, four max-pooling layers, and a softmax layer [27].

Evaluation methods
This paper works to assess the usability of the proposed pattern recognition system as a step toward designing a prosthetic hand. Therefore, the work is divided into two phases; The first one is applied to the collected EMG signals for the basic hand movements using three time-domain features and SVM as a classifier. The second one is applied to the collected image of 24 items to classify these items according to the hand movements using the DL algorithm, i.e., GoogLeNet. The performance of the classifiers model is described using the confusion matrix. This matrix clarifies the relationship between the predictive and the actual events. Table. 2 shows the confusion matrix model.

EMG Data
The first phase contains the training session for the basic hand movements. In this session, five participants (two males, three females) have been taken to collect the raw EMG signals by applying the MAV, SD, and PCA to extract the features. After that, the SVM and algorithm are used to classify six basic hand movements. Table. 3 shows the three different experiments; each experiment has different numbers of features. In the first experiment, only the MAV is used to extract the feature; for this reason, the training accuracy reaches 57.4%. While in the second one, both MAV and SD are applied together, so the training accuracy increases. In the last experiment, the MAV, SD, and PCA are used, and they achieve the highest training accuracy. As concluded, the increment in features with an acceptable limit affects increasing the accuracy percentage.

RGB image data
The second phase contains the training session of 723 images for 13 items; these items are categorized according to how the hand will hold the item. That means the shape of the figure movements will differ from one item to another. Therefore, all the items are classified into seven classes, i.e., cylindrical, spherical, palmar, tip, lateral, hook, and full hand. The GoogLeNet is used to extract and classify the 724 images into seven classes. In this work, the GoogLeNet is constructed with 144 layers, starting in the input layer representing the image that enters the net with size 224*224*3. Since the GoogLeNet is considered a type of CNN, the net is contained 57 layers as the convolutional layers. Then, the output of each convolutional layer is fed to an activation function layer to generate an activation map. In this work, 56 layers used Rectified Linear Unit (ReLU) as activation function due to the fast rectification to output zero if the input less than zero. After that, 13 max-pooling layers are used to reduce the number of parameters and the spatial size of input by selecting the maximum value from a group of numbers. For more optimization in feature selection, two layers with cross-channel normalization are added. They are also adding nine layers for depth concatenation to extract the features from the third dimension of the image. The last four layers are started with the average pooling to select the final features, then the dropout layer to drop some neurons randomly during the training to overcome the problem of overfitting. After that, the fully connected layer connects each neuron from the previous layer to all the neurons in the next layer. The last layer is the Softmax classifier that satisfied the probability of 1 for each class. Fig. 4 shows the training curve for the GoogLeNet. As noticed from the curve, the training accuracy reaches 99% using six epochs and 390 iterations; that is the mean value of 65 iterations per epochs with a learning rate of 0.0003.

Testing dataset
Testing for the proposed system is needed after finishing the training part. Therefore, the test is applied to 13 items with different backgrounds, brightness, contrast, and direction if horizontal or vertical. The testing part aims to clarify the classification for each item and the ability of the system to achieve high recognition accuracy. Table. 4 and fig. 5 show the testing for 13 items and the accuracy of PR above each item.

Discussion
The intelligent prosthetic hand design requires adding multifunctionality characteristics. Hence, this work is divided into two parts; the EMG signal part to support the muscles' action and the RGB image part to support the visual section. The reason for choosing the basic hand movements because it considers the essential step toward implementing the prosthetic hand by covering the following motions (cylindrical, spherical, hook, palmar, tip, and lateral).

The prosthetic based on EMG signals
In the first part of the work, the Myo Armband sensor is used instead of the traditional sensors to solve noise issues and avoid the DC component in the raw signals; there is no need to use any filter. Also, the Myo armband covers the full forearm muscles. The data in each session were collected for six EMG movements from five subjects. The time-domain analysis is used instead of frequency domain analysis to extract the raw EMG signals' features because it shows better performance in classifying the EMG signals.
In the basic hand movements, the SVM was used and achieved acceptable training accuracy. Also, show a fast average training time reaching 7.55 sec. The obtained classification results show that the cubic support vector machine achieves the highest training accuracy among other SVM types of ML due to the multi-hyper lines in this algorithm. Three experiments are done, as noticed from the results; the first experiment has the lowest accuracy of 57.4% among all the experiments due to using only one feature (Mean) for signals analysis. The second experiment used two features (Mean and standard deviation) to reach a training accuracy of 87%. The results achieve the highest accuracy reaching 89%, when using three features (Mean, standard deviation, and the principal component analysis). At the same time, the PCA is working on signal dimensions and increasing the accuracy. Therefore, the increasing number of features means increasing training accuracy because feature extraction helps to analyze signals. Fig. 6 shows the classification performance using different features.

The prosthetic based on RGB images
In the second part, the camera is planted into a prosthetic hand; this configuration enriches the system transmission information. Besides, combining the EMG signals and the visual part using artificial intelligence to implement the prosthetic hand can significantly expand the bandwidth of the feedback information to enhance the interactivity control. If there are multi objects, the direction of the prosthetic hand will be toward the closest one.
The proposed vision-based PR method selects the GoogLeNet algorithm to classify RGB images of daily used objects into the six basic hand movement patterns for controlling the prosthetic hand. Some studies focus on image classification using CCN to recognize the objects without categorized them [29]. In comparison, other studies classified the image according to color, texture, Etc. [30]. Differently, this research work is categorized the RGB images into seven categories according to how the prosthetic hand will hold the item-also, considering the object's detail such as color and texture, and increasing the recognition ability by putting the objects at different distances from the camera using different backgrounds. The GoogLeNet was used for Deep Learning to achieve high training accuracy, reaching 99%, and high testing accuracy with a 95% average. Although some studies on RGB-D image classification are using CNN, most of them focused on object classification [30]. Object classification mainly depends on the detailed characteristics of objects (such as texture, color, etc.). However, the visual part has many challenges, like the multi-objects, the intricate backgrounds, and the distance between the camera and the objects; all these factors can affect image recognition. Working on these challenges can achieve revolution in the intelligence prostheses.

Conclusion
This paper proposes a novel prosthetic hand control system for liver classification using the GoogLeNet and the SVM classifiers. The system also collaborates in building a visual-based object classifier that is conventional with an EMG-based motion classifier; this cooperation controls the prosthetic hand in six hand movements (cylindrical, tip, spherical, lateral, palmar, and hook). The goal of this work has been accomplished by improving the operability of the prosthetic hand. Therefore, the hand can recognize the target object using the camera that builds in the prosthetic hand. After that, the prosthetic hand will change the posture separately according to the shape of the object.