Real-time classification of various types of falls and activities of daily livings based on CNN LSTM network

In this research, two multiclass models have been developed and implemented, namely, a standard long-shortterm memory (LSTM) model and a Convolutional neural network (CNN) combined with LSTM (CNN-LSTM) model. Both models operate on raw acceleration data stored in the Sisfall public dataset. These models have been trained using the TensorFlow framework to classify and recognize among ten different events: five separate falls and five activities of daily livings (ADLs). An accuracy of more than 96% has been reached in the first 200 epochs of the training process. Furthermore, a real-time prototype for recognizing falls and ADLs has been implemented and developed using the TensorFlow lite framework and Raspberry PI, which resulted in an acceptable performance.


Introduction
It is a known fact that the proportion of older adults living alone happens to be grown globally wide. Annually, nearly 28% to 35% of elderly individuals aged 65 years and above fall two to four times, increasing with age [1]. Falls are prominent among other causes of unintentional injury. Fall events influence the physical and psychological health of an older adult. Older adults have a weak and vulnerable body, and injuries resulting from falls include physical damage and bone fractures. It may lead to death or prolonged lie inability to recover [2]. Many researchers have studied falls and activities of daily livings (ADLs) using different methodologies, including threshold and machine learning. A threshold-based detection algorithm [3] is broadly used in wearable devices. A fixed threshold will determine other acceleration motions associated with the human body, according to some particular combination of actions, to determine whether a fall or non-fall event has occurred. The primary aims of fall and ADL recognition are to immediately detect the occurring of a fall in real-time and result in a rapid alert that can reduce outcomes with medical help response time. Researches about this interest are broadly categorized into wearable-based devices and vision-based devices. Vision-based devices generally have some constraints; vision-based have come up as an approach to hold the strengths of camera-based systems, as such technology is rich with relevant information about the surrounding place. Wearable-based devices tend to be a more practical approach that can ensure privacy and individual convenience; furthermore, portability and low cost are the most salient features of these devices. This research aims to build a solution to recognize and predict falls and ADLs by using the Sisfall public dataset [4]. Sisfall dataset is the wealthiest dataset with variants of falls and ADLs, carried out by more than 30 participants, one of them an elderly individual who performed all fall events. Using various methods, machine learning and deep learning can be performed on this dataset.
The application of deep learning models for falls and ADLs recognition using wearable-based sensors has been an area of recent focus, with new methods being introduced constantly. Schalk [5] conducted the LSTM model using the WISDM public dataset to recognize the activities of daily livings using an accelerometer sensor. Six activities were identified, and the results showed more than 94% accuracy. Wayan [6] used a UMA public dataset [7] composed of falls and ADLs using accelerometer and gyroscope sensors. Their experiments on the LSTM model showed that the best accuracy resulted from the accelerometer sensor. They set a binary classification model to differentiate ADL from falls. Each accelerometer axis is separated and then fed into the neural network to be trained; the final results showed that using x-axis accelerometer data only leads to high classification performance. Musci [8] implemented real-time online fall detection based on the LSTM model. Sisfall public dataset was used with three classes to be detected in real-time: fall, alert, and non-fall; the result of the experiment was compared to the obtained results from the Sisfall authors, and they showed high precision on fall detection. In [9], the authors combined different machine learning and deep learning models, including the LSTM, to enhance the classification of falls and ADLs. Twelve healthy subjects performed a collection of accelerometer and gyroscope sensor data. The multiclassification experiment showed that 99.81% average accuracy resulted from the LSTM model. Also, their research showed high accuracy in traditional machine learning models, while the Support vector machine (SVM) model showed 98.26% average accuracy. Hybrid models also have been used in related work; in [10], the authors combined Convolutional Neural Networks (CNN) with the LSTM model. Hence, the CNN feature extraction ability and LSTM ability for processing timeseries sequences were utilized; their results showed that the CNN-LSTM model has higher detection accuracy than the SVM even with small volume datasets. This research proposes two variants of LSTM architectures for classifying different types of falls and ADLs using data gathered from the accelerometer sensor only. Also, we compiled the trained CNN-LSTM model using TensorFlow Lite for real-time hardware implementation on the Raspberry Pi platform.

Dataset
In this section, we present the details of the main flow of this research used for fall and ADLs recognition. This approach has been applied primarily to the Sisfall dataset [4]. Figure.1 shows the flowchart for the steps performed in this research to predict the falls and ADLs events from accelerometer sensor data. For convenience, we will discuss it in terms of three significant steps. To develop the algorithms for the proposed model, Sisfall public dataset was being used. Sisfall dataset contains 15 falls and 19 ADLs performed by 38 subjects with a sensor fixed on their waist. Among other public domain datasets, Sisfall is distinct since it has older people who performed falls and activities of daily living (ADLs). The Sisfall dataset was collected using three different sensors, two of them are accelerometer sensors, and the third is a gyroscope sensor. The dataset is given in bits and can be easily converted into gravitational acceleration (GA). The ADL activities in the Sisfall dataset include walking, sitting, jogging, standing..., etc. Whereas fall activities are 15 activities, i.e., falling forward, falling backward, falling while walking…etc. The dataset is in CSV file format. It is stored in folders and sorted by subjects and activities, where each file name represents an activity and the subject code and the trials that an issue is performed for an activity. We summarized the significant characteristics of the Sisfall dataset in table.1. We decided to choose the accelerometer data only since previous studies showed high accuracy when using accelerometer data only [5]. Figure.2 shows the 3-axis acceleration curve for some falls and ADLs recorded in the Sisfall dataset. Since deep learning is computationally costly in the training stage or even at the prediction stage, we selected distinct classes from Sisfall to evaluate the LSTM and CNN-LSTM models. Table.2 shows the selected categories.

Proposed model
This research uses pure long-short-term memory (LSTM), and convolutional neural networks (CNN) combined with LSTM.

Long-Short-Term Memory (LSTM) Architecture
In this part of the research, we used the LSTM model to classify falls and ADLs activities; the design of LSTM is a part of recurrent neural networks (RNNs). LSTM consists of six layers: three LSTM layers, a dropout layer, a dense layer, and a Softmax layer. Figure. 3 shows the LSTM architecture and the used layers. Figure 3. LSTM architecture and its layers LSTM can process a single sequence (sample) at each time step for different motions and can process an entire sequence of data (samples). It learns future dependencies between time steps associated with the input data. LSTM layer accepted the data to be three-dimensional shape; these three dimensions are samples, time-steps, and features.

CNN-LSTM architecture
Convolutional neural networks (CNN) combined with long-short term memory (LSTM) architecture have been used. Which we think is more suitable for fall and ADLs recognition than LSTM alone. CNN-LSTM hybrid models inspired from [14], [15]. CNN was designed to process images and classify them by a 2D filter. However, the acceleration data from the Sisfall dataset is time-series data with samples and time steps (in seconds). For that reason, we need a filter of one dimension instead of two dimensions. 1D-CNN model is used to extract useful features from time-series data and used for time series forecasting. 1D CNN can process data and extract discriminative features with a fixed sliding window from a dataset. The sliding window for our dataset has a fixed window length of 200 samples. The CNN architecture has four convolutional layers with ReLU activation, followed by a Batch-Normalization layer and Max-Pooling layer, as shown in the figure. 4, the output data from the previous CNN layer is passed through a flattened layer and then passed through three LSTM layers, a dense layer, and a Softmax layer, which is the output. Figure. 4 shows the CNN-LSTM architecture.
We used TensorFlow [11], supported by Python programming language, a deep learning framework, to build the CNN-LSTM model. The LSTM layer learns future dependencies between time steps associated with the input data. To have many batches of data close at hand, it is necessary to preprocess and split each class in this dataset into a fixed set of samples and store them into segments. This approach is applicable for CNN and LSTM layers. Hence, we chose a sliding window of 200 pieces with a step size of 40 works to be about 5 seconds worth of data at a time. These segments can be fed into the CNN layer for the feature extraction phase, and the LSTM layer will learn what features belong to what class. 1D CNN layer processes data in one dimensional and the data must shaped as input_shape (time-steps, parts for the time-steps). Figure.5 illustrates 1D CNN with its time-steps and features; CNN has the kernel size, which is the filter size that can move along the axis of time. There are three features at the first layer: the three-axis acceleration raw data. We scaled the raw data to a fixed range between -1 and 1 through Min-Max scaling. These scaling factors restrain the effect of outliers, observations, or null values in the acceleration values. Equation (1) describes the Min-Max scaling that is usually done in deep learning models: We divide the dataset into training, validation, and testing sets of 60:20:20 of the entire dataset. Figure.5 shows an illustration of these proportions. We then sampled randomly from the random state with 40 different random states. The reason to have validation data is to evaluate the quality of the model and avoid under-and overfitting during the training process.
To alleviate the overfitting on the training data, which results in a low accuracy on the validation dataset and testing dataset, a dropout layer is used with a scalar value of (0.3 ~ 0.5).

TensorFlow Lite
TensorFlow Lite is an open-source deep learning framework from Google Company [12]. It's a flexible platform that allows the deployment of pre-trained neural network models for on-device inference with high reaction time and small file size that applies for embedded and IoT devices. TensorFlow Lite supports many languages such as Python, JavaScript, R, and Swift. Figure.7 shows the proposed real-time prototype for the fall and ADL recognition system.

Hardware implementation
Raspberry Pi, a minicomputer, was used in this system as the primary hardware. Raspberry Pi carries a Linux operation system and can be used as a developer platform using its pins or as a computer. We have used it as a developer board to evaluate our model in our work. ADXL345 accelerometer sensor has been used to collect tri-axial acceleration data from the outside world. The hardware devices used in the proposed real-time recognition system are shown in Figure.8. Moreover, TensorFlow lite models can work with different devices like an app on a mobile phone or with Microcontroller units (MCUs).

Software implementation
Python was used as the primary programming language to implement this system; the code is straightforward to be implemented. It starts with importing the TensorFlow Lite file into the Python script and then resizing or manipulating the input tensor shape to the corresponding inputs from the ADXL345 sensor. For real-time recognition systems, inference latency should be minimized, and this could be achieved by sampling the ADXL345 sensor readings with a fixed sample set.

Results and discussion
This section presents the results of training our LSTM and CNN-LSTM models to classify falls and activities of daily living (ADLs). Experimental results showed an accuracy of 93.11% using the pure LSTM model. Figures 9 and 10 show the training and validation loss and confusion matrix of the LSTM model, respectively.  We used the CNN-LSTM hybrid model to classify falls and ADLs events in the second experiment. After many tests with different epochs and batch sizes, we decided to use an epoch of 300 and a batch size of 128. The experiment was conducted with other optimizers such as RMSprop [16], Adam [17], and stochastic gradient descent (SGD). We decided to choose an Adam optimizer with a learning rate of 0.001. Choosing the learning rate value could be frustrating; a low learning rate value could take a very long time to train or fail at training; in contrast, a high learning rate value can lead to difficulties in converging loss. This is the trade-off we should consider when picking the learning rate value. The same applied to the dropout rate but with a different scenario. A high dropout value could cancel out many weights and biases at training time and result in undesirable performance. It is worth noting that the training time is a linear function of the hyper-parameter. It is observed from the results that there is a trade-off between training time and the hyper-parameters used during the training process. The validation accuracy must be stable to prevent the model from overfitting towards the training data. The training process needs a computer with a high-performance graphics card; because of this limitation, the model was trained online on Kaggle website with a GPU accelerator and implemented using Python 3.7 environment, several experiments (average training time of ~ 8 hours) were performed. Experimental results showed an accuracy of 98.18%. Figures 11 and 12 show the training and validation loss and confusion matrix of our CNN-LSTM model, respectively.
Where true positive (TP), a fall has occurred, and the model correctly predicts that. False-positive (FP), the model predicts normal ADL action as a fall. True negative (TN), a fall occurred, but the model predicts it as ADL. False-negative (FN) fall occurred, but the model does not predict it. Sensitivity measures the model's ability to predict all actual falls; specificity measures the ADL prediction rate; accuracy, which is the proportion of accurate prediction results in this model. Table 3 summarizes the classification performance. We also compared our results with several other algorithms evaluated on Sisfall public dataset, it should be noted that some of the algorithms are binary classification method, comparison of the performance is given in Table.3 below. From the compression table above, our proposed model shows the higher accuracy using CNN-LSTM model and raw acceleration data. Traditional machine ;learning like SVM and KNN needs a lot of feature engineering related to time-domain and frequency-domain nature of acceleration data, dealing with traditional machine learning is obsolete in most cases these days. CNN model make it is easier by its ability for automatic feature extraction using convolution filters.

Results of real-time CNN-LSTM fall and activities of daily living (ADL) recognition
The implementation of the real-time falls and ADLs recognition will be presented here. The prototype is tested on an individual of 30 years old for three classes, each class for 15 seconds. The real-time device setup for the participant and the results are shown in figures 13 and 14, respectively. (c) Figure 14. Results of the real-time falls and ADLs recognition prototype, (a) the individual performs "standing," and the model predicted that class correctly, (b) the individual performs "strolling," the model predicted as "walking upstairs and downstairs," (c) the individual performs falling forward, and the model predicts this class correctly After running the Python script responsible for the real-time recognition process, an array comprising ten elements is shown as the printed output. The array holds ten classes that we trained our model on. Figure.10 (a) and (c) both show that the recognition process for the real-time process is done correctly. Unfortunately, the model misclassified "walking slowly" as "walking upstairs and downstairs." This misclassification occurs in real-time and is based on the several experiments we have done for this. Hence, it is worth saying: first, the accelerometer sensor should be positioned in a fixed place to result in inappropriate readings. Secondly, further calibration of the accelerometer sensor should be done, and the real-time model is susceptible to variations in acceleration values. Besides these observations, the sampling rate of the collected data and the scaling process applied during training must be met during the real-time process. This means that the captured data from the accelerometer sensor should be sampled at 200 Hz, and Min-Max scaling should be applied too.

Conclusion
First, a multiclass LSTM model has been developed to discriminate various types of falls and ADLs. The LSTM model has achieved an overall accuracy of 93% at the testing phase. It has been observed that the LSTM has a slightly degraded performance on resolving various types of falls caused by a slip as opposed to falls caused by a trip. A combined CNN-LSTM model has been proposed and implemented to improve classifier efficiency.
The proposed model has a superior performance over the LSTM network. We have reached an overall accuracy of 98.74% in the testing phase. On the hardware side, we have converted the trained CNN-LSTM model to TensorFlow Lite for implementation in Raspberry PI for on-device inference. The model has been tested against data collected using the ADXL345 accelerometer sensor, and we have successfully shown that the model provides acceptable results for real-time recognition.