Hardware implementation of deception detection system classifier

Non-verbal features extracted from human face and body are considered as one of the most important indication for revealing the deception state. The Deception Detection System (DDS) is widely applied in different areas like: security, criminal investigation, terrorism detection …etc. In this study, fifteen features are extracted from each participant in the collected database. These features are related to three kinds of non-verbal features these are: facial expressions, head movements and eye gaze. The collected databased contain videos for 102 subjects and there are 888 clip related to both lie and truth response, these clips are used to train and test the system classifier. These fifteen features are placed in a single vector and applied to Support Vector Machine (SVM) classifier to classify input feature vectors into one of two classes either liar or truth-teller class. The detection accuracy of the proposed DDS based on SVM classifier was equal to 89.6396%. Finally, the hardware implementation for SVM classifier is done using the Xilinx block set. The design requires 136 slices and 263 of 4 input LUTs. Moreover, the designed classifier doesn’t require any use of both flip-flops and MULT18X18SIOs. The selected hardware platform (FPGA kit) for implementing the SVM classifier is Spartan-3A 700A.


Introduction
Deception is one of the most common activity in every human daily life. It is better to say, deception is a fundamental part in human communication process and subjects tend to deceive every single day [1]. For this reason people become interested with the Deception Detection System (DDS) and the used techniques for revealing the deception state [2].
There are several difficulties in the deception detection process. First, there are huge differences between liars in terms of their expressed emotions, these differences are referred as interpersonal differences. Second, the individual difference between innocents and liars, this difference is referred as intra-personal difference. Third, the difficulty when a person tells a lie and support it with a true statement, this state is referred as embedding lie in a truth. With the development in the computer science and the used techniques for DDS, these problem are solved [3]. The main application of DDS are: security, hiring new employees for business, criminal investigation, law enforcement, terrorism detection …etc [4]. In general, deception features can be classified to either verbal or non-verbal. Each type contains specific categories. The verbal cues are extracted from the voice analysis while non-verbal cues are extracted from full body motion, head movement, facial expressions, eye gaze, pupil dilation and eye blinking [5].
The main aim of this study is to explain the hardware implementation of deception detection system classifier. This study mainly consists of six sections that are organized as follows: related works, the main stage of deception detection system, data collection process, the proposed deception detection system (DDS), experiment and results, hardware implementation and the hardware experimental result.

Related works
The design of DDS mainly depends on either verbal or non-verbal features. Recently, non-verbal features approved its efficiency in deception detection process due to their simplicity and robustness. Recently, the attention is made toward facial expressions, thermal imaging and brain signals. Jain et al., (2012), introduced a lie detection system that depends on thermal imaging technique. This study was performed on 16 participants and the system detected the most affected area in the human face to the temperatures changes. The designed lie detection system achieved an accuracy of about 83.5% [6]. The thermal imaging technique also used by Bedoya-Echeverry et al. (2017). They detected temperature changes in the lacrimal puncta area for 27 subjects. This work used a simple classification technique that is based on comparing the estimated temperatures in control questions, and remaining parts of the interrogation, the detection accuracy was 79.2 % [7].
The DDS based on detecting and measuring brain signal is introduced by Amir et al. (2013). They depended basically on detecting beta wave (ranging from 13Hz to 30Hz) because it represents the most brain activity when a person cautious, nervous or alert. Eighteen participants were employed to test this work [8]. This cue is also used by Simbolon et al. (2015). They measure the brain activities through Event-Related Potentials (ERP), the ERP used as cue for distinguishing suspect from innocent subjects. Eleven males were used as participants in this study, and the collected data were divided for training and testing the system. The used classifier was Support Vector Machine (SVM) classifier that use signal P300 as marker. The detection accuracy of the designed method was 70.83% [9].
For facial expressions, Azhan et al. (2018) proposed an algorithm for lie detection using machine learning techniques and facial micro-expressions. They depend on detecting facial micro-expressions. They used two techniques for machine learning steps; these are Two-Class Support Vector Machine and Liner Regression. The system was trained based on 1019 videos and the achieved detection accuracy was 76.2% [10]. Another study performed by Thannoon et al. (2019). The facial expressions were encoded according to Facial Action Coding System (FACS). Eight AUs extracted and used in this study, these are AUs (5, 6,7,10,12,14,23 and 28). The collected database contains videos for 43 participants. The Virtual Generalizing Random Access Memory Weightless Neural Network (VG-RAM WNN) classifier was used for classifying these features, the detection accuracy of the designed system was 84% [11].

The main stages of deception detection system
The automated DDS consist of three stages, these stages are arranged as follows: data collection and pre-processing, features extraction and finally the classification stage. Figure 1 shows the general block diagram of the automated DDS. These three stages are explained below with more details.

Data collection and pre-processing stage
The first stage is related to collecting videos (data). These videos are for participants under test. After this step, it is necessary to determine the essential durations that contain important features for deception detection. The results after this step are called video clips. These clips are then applied to face detection algorithm in order to detect subject face and distinguish it from non-face (background) parts. The resulting face detected images are utilized by features points (landmarks) detection algorithm. The importance of this step is to place points on the Regions of Interest (ROI) in the subject's face image. These regions are: face border, nose, mouth, eyebrows and eyes.
One of the most accurate and well know face detection algorithms is the Viola-Jones (VJ) algorithm. It gained its popularity due to several reasons like: fast, accurate detection, robustness, detect multiple faces in single image and operate in real time face detection systems [13]. For landmark detection process, the Constrained Local Neural Fields (CLNF) is used [14]. The CLNF is considered as the most efficient and robust method for landmark detection and it operates in naturalistic and unconstrained environments.

Features extraction
The second stage in DDS is the features extraction stage, three types of features are extracted these are facial extractions, head movements and eye gaze. These features have a direct relationship with the mental process so they effectively reveal deception.

Facial expressions
The automatic system for facial expressions analysis and measurement have been widely adopted in different fields that are related to security, entertainment, clinic and commercials. Facial features are described and analyzed based on a standard coding technique that is usually referred as Facial Action Coding System (FACS) [15]. FACS encodes each movement related to a specific facial muscle in a form of Action Unit (AU). The detection of AUs depends on using two types of features these are: geometry and appearance [16]. Geometry based features are determined and measured based on both landmark points location and the shape parameters. In appearance, features are extracted from utilizing Histograms of Oriented Gradients (HOGs) [17].

Head movements
Humans tend to use head movements as a sign when they communicate or interact with others. There are different head actions like lowering, raising, nodding and etc... Each action is related to a specific meaning [18]. For head tracking, CLNF method is used that depends on Generalized Adaptive View-based Appearance Model (GAVAM) for head pause tracking in varying illumination conditions. This technique was introduced in 2008 by Morency [19]. Tracking method operates on image sequence (video) and perform estimation of head translation and orientation in a form of three dimensions [20]. Figure 2 shows head movements according to the mentioned axes.

Eye gaze
The signal taken from human eyes is considered a source of rich information, this information is related to the mental process directly [21]. The direction of eye gaze reflects the internal state or the information stored in the brain. This information refer weather a person is thinking in both the visual and auditory, making internal dialog with himself, accessing and visual feeling [22]. Eye gaze detection process pass through two steps, the first step, is referred as Eye-shape registration and the second step is called appearance-based gaze estimation [23]. The first step is applied to identify the shape of the eye region by placing landmark points around the eye region. CLNF is the used algorithm for locating and tracking landmark points. The second step is to determine appearance features for eye region. This feature is determined from pixels that contained in the eye image directly [24].

Classifier
When Features extraction process is complete, it becomes necessary to apply decision classifiers [25]. In the previous stage, three kinds of features are extracted, these are: facial expressions, head movements and eye gaze. These features are combined together and applied to the classification stage in order to distinguish truthteller from liar subjects. In this work, The Support Vector Machine (SVM) classifier is used for binary classification. It is a supervised learning algorithm. The main operation of SVM is finding hyperplane that provides maximum margin distance between classes. There are two types of classification; either linear or nonlinear SVM [26]. The classification process is simply performed by just finding optimal hyperplane that provides enough separation between classes. For better description about linear SVM, the training pattern is given as [27]: Where xi represents input pattern while yi represents desired output and its value ϵ {1, -1}. The equation of linearly separable problems are define as [28]: Where w is the weight vector and symbol b represent bias. The essential equation that represents optimum hyperplane that provide good speration between classes, is given as following [28]: w * x + b = 0 (4) Figure 3 shows how optimum hyperplane provide maximim distance for sperating classes based on the previous equation. For better performance, SVM must apply to the data set that have a set of features. The main problems related to SVM are: limitation in both speed of operation and size of the two phases (training and testing). The second drawback is when selecting the kernel function parameters. Kernel functions transform the non-linear data into higher dimensional space. These kernels are (Gaussian, multi-layer perceptron, polynomial, radial basis function, sigmoid, etc.).

Data collection process
Each participant enters the interview and asked approximately seven control questions and 36 relevant question. During the interview period a camera type Canon 2000 D that record video for subjects under test. In this study, the collected database contains videos for 102 participants, 25 of them are females and 77 are males. Their ages range from 18-55 years. The recorded video for each subject contains both lie and truth response. Figure 4 show sample image for participant during the interview.

The proposed deception detection system (DDS)
The proposed DDS mainly consists of three stages which are arranged as follows: video recording and preprocessing, features extraction and classification. The first stage related to recording videos for volunteers then perform editing step to perform face and landmark detection, Figure 5 show the details of this step. Figure 5. Video recording and pre-processing in DDS Extracted features from collected videos are applied for classification method to determine liar from innocent. Figure 6 shows the features extraction and the classification stage. After videos are recorded for participants it become necessary to perform video editing, editing means determining the necessary parts (frames) in the captured video. To perform editing, windows movie maker program is used and the resulting video file format is MP4. The next step is performing face detection. This work is based on using VJ algorithm that simply utilizes grayscales of the input image in order to extract features using Haar-like feature (feature blocks). These features are applied to AdaBoost classifier and the output would contain all sufficient and necessary information that clearly describe face region. This algorithm uses 31 cascades AdaBoost layers with threshold value equal to 3. After Applying face detection algorithm, the output face image is used for initializing landmarks points. The CLNF algorithm is used for locating 68 points on the detected image as shown in Figure 7.

Dynamic feature extraction
For features extraction stage, there are three kinds of features are extracted from each participant. These features are: facial expressions in the form of AUs, head movements and eye gaze. For AUs detection process is required to capture two kinds of features, these are: geometry and appearance features. Geometry features basically depend on capturing both; feature (landmark) point location and non-rigid shape parameters. Before starting with the extraction of appearance-based features, it is necessary to removing any non-facial parts from the given image. After removing these parts, masking operation is performed. The output from this step is the face image that contains only facial parts and is usually referred as "alignment and masking". The resulting image from face alignment and masking step is applied directly to HOG, in which appearance features are extracted.
For Head movement detection process that are describe based on both head transitions and orientation (rotation). For transition representation, head location is represented in three dimensional axes these are x, y and z. For rotation, head movements are described based on Euler's angle that consists of three axes these are pitch, yaw and roll. These six features (x-axis, y-axis, z-axis, pitch, yaw and roll) are fully describe head movements, so it extracted from each subject.
For Eye gaze detection or eye gaze estimation, is referred as the process of identifying gaze direction (where a participant is looking at). In order to determine gaze direction it is necessary to perform two steps these are: first, identify head orientation, second, identify gaze direction. There are two features that can be extracted from participant's eyes. First is the eye gaze angle in x direction, this direction relates to moving eye gaze from leftright. The second feature is related to eye gaze in y direction. The change in this direction occurs when participants move their eye up-down.

Decision maker (classifier)
In this work, non-linear SVM classifier is used because the linear SVM cannot solve or find the linear boundary between features vectors during the training process. To make SVM classifier support non-linearity, the kernel method is applied to support this feature. Kernel methods simply map the input feature vector to a higher dimensional space. One of the most common kernel functions is the Radial Basis Function (RBF). Figure 8 shows the main steps for the suggested DDS based on SVM classifier.

Experiment and results
The extracted features from each participant consist of fifteen features these features are related to facial expressions, head movements and eye gaze. The extracted features are arranged in a single vector and applied to SVM classifier. The input data consist of 888 clips. Half of them are used for training the classifier while the remaining 444 clips are used for testing it. Table 1 show the classification results of the SVM classifier. It's clear from the Table, 229 samples from lie response are classified correctly and labeled as lie class, also there are 169 sample belong to truth response and classify correctly as belong to truth-teller class. So the correctly classified sample is equal to 398 sample. The remaining samples from 444 are classified incorrectly, there are 23 sample belong to lie response and classify as belong to truth-teller class and additional 23 sample belong to truth response and classify as belong to lie state or class. The final detection accuracy of the suggested DDS based on using the SVM classifier is equal to 89.6396 %.

Hardware implementation
The hardware implementing for the complete DDS is considered as a difficult process due to several reasons. First, the pre-processing step for all captured videos requires a huge memory and hardware. Second, the used algorithms for features extraction requires several processing steps which makes it difficult to be implemented on the hardware and it also requires the need to advanced platforms for the hardware implementation in order to keep track with the complexity of these algorithms. For these reasons only the classification stage is implemented on the hardware. Using the Simulink Xilinx library available in MATLAB program that configured by the Xilinx ISE 14.7, the FPGA model for the SVM classifier can be created. The input feature vector for each video clip is applied by using the constant block named F abbreviated by a number (Fnumber). This number refers to the location of feature in the features vector and it should be within a range from 1 to 15 (because there are 15 features in the input vector). This constant block or usually named as source block parameters that is mainly used for applying input data to the design. The output from constant block is applied to Xilinx Gateway In block that prepares the received data based on the specified configuration. In this implementation, this block is configured to be fixed-point, unsigned and the number of bits is one bit because the input data is in a binary form. After these two blocks, the data become ready for any operation needed to perform. Implementing the SVM classifier is done by using the Xilinx Bit Basher block and the Xilinx Logical block. The Xilinx Bit Basher block is mainly used for extraction, concatenation and augmentation of inputs applied to this block and it is widely doubted for applications that use a stream of bits. The Xilinx Logical block is mainly responsible for performing bitwise logical operations that include: AND, NAND, OR, NOR, XOR and XNOR. The input features are applied directly to Xilinx Bit Basher block which simply passes the data bits directly to the Xilinx Logical block and configured to work as a logical AND gate. The Logical block perform AND operation between input data and the associated weight value with this data. Figure 9 shows a part of this implementation to clarify the arrangement of the used Xilinx block set with the connectivity between them.  Finally, the output of the logical AND gate is applied to Xilinx Adder/ Subtracter block which is configured to perform addition operation. The adder block used in stages in order to compute the cumulative sum of all input features. The last adder is placed to add the cumulative sum with the bias to produce the final output of SVM classifier. Figure 10 shows the overall block diagram of this design.

F11
In  Table 2 shows the device utilization report of using the Xilinx Bit Basher block and Xilinx Logical block for constructing the design of SVM classifier. This design requires 136 slices which is equal to just 2% of the available occupied slices. The total used number of 4 input LUTs is equal to 263 which is approximately utilizing 2% of the available LUTs. The design requires just 17 pins (15 pins for representing the 15 extracted features and 2 pins for representing class number) from the IOB that represent about 4%. Table 2. The devise utilization summary of the proposed DDS based on SVM classifier when implemented using the combinational logic gate The power analysis report for this design is shown in Figure 11. The total consumed on-chip power is equal to 0.034 W. For the thermal properties, the effective TJA is equal to 22.3 °C/W while the maximum Ambient Temperature is equal to 84.3°C. Finally, the estimated junction temperature for this design is equal to 25.7 °C. Figure 11. Power analysis reports for the optimized design of SVM classifier based on the combinational logic gate block After completing the design, the bit stream file for the proposed DDS based on SVM classifier is created. After that the bit stream file becomes ready to be downloaded into the selected platform. The file is downloaded into the FPGA kit in order to configure the device according to the design of the DDS classifier. For downloading, different packages are required, these are: Xilinx ISE 14.7 that is used to target device configuration and sending the created file to the selected kit through the JTAG cable. The system generator is configured with one of the available MATLAB versions. The last package is used as hardware co-simulator (hwcosim) which is used for configuring the selected kit and to connect the running of the designed system in the hardware (FPGA) platform with MATLAB/Simulink promptly. The main advantage of using hwcosim is to test the hardware system through sending the feature vector which contains three kinds of features to the hardware platform in addition to the created bit stream file and extract the final results. After compiling the system model by XSG, the hwcosim is produced. When Simulink starts the run, the FPGA model is downloaded directly to the connected hardware board. The download process is done in a form of bit stream and this stream is used for configuring the hardware platform according to the system design. Figure 12 shows how the FPGA kit (Spartan-3A 700A) is connected to the laptop through the JTAG cable. The MATLAB software handles the video clips that are applied as input for the proposed DDS, it performs a set of processing steps and extracts features from these video clips and arrange them in a form of vector. The features vector then are directly send to the hardware kit through the JTAG cable in order to assigne the input vector into one of two classes; either liar or truth-teller. As mentioned previously, the input dataset is divided into two halves one for training the classifier and the second for testing it. 444 clips are used to measure the performance of the proposed DDS system based on using the SVM classifier. Table 3 shows the performance of hardware implementation of the suggested system.  Table 3, it is clear that the measured performance achieved by the hardware implementations is the same as that achieved by the simulation, which verifies its correctness and validation.

Conclusion
The result from the designed DDS based on hybrid technique for features approve the system efficiency and correctness. Fifteen features are extracted from each subject and this feature are related to three major kind of non-verbal features these are: facial expressions, head movements and eye gaze. These features are extracted from each participant in the collected database. The collected databased contain video for 102 Iraqi subjects, these videos are partitioned to small parts called video clips. The resulted video clips equal to 888 clip, 444 clips are used for training the SVM classifier while the remaining for testing it. The detection accuracy of SVM classifier is equal to 89.6396 %. Finally, the hardware implementation of SVM classifier based on combinational logic gate. The design utilizes 136 slices and 4 input LUTs equal to 263 without any utilization for flip-flops and MULT18X18SIOs.