SCORE FUSION OF FINGER VEIN AND FACE FOR HUMAN RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORK MODEL

1) Laboratory of Systems Engineering and Information Technology, National School of Applied Sciences, Ibn Zohr University in Agadir, Morocco el.cherrat@gmail.com, hbouzahir@yahoo.fr 2) Laboratory of Computer Science and Telecommunications Research, Faculty of Sciences, Mohammed V University, Rabat Morocco alaoui.rach@gmail.com 3) Multimedia, Signal and Communications Systems Team, National Institute of Posts and Telecommunication, Rabat Morocco alaoui.rach@gmail.com


INTRODUCTION
At present, recognition biometric system has been widely used in important fields such as criminal identification, securing access to buildings or personal objects, financial payment, etc. Further, the technology based on biometric is divided into two categories: the physiological and behavioral characteristics. Human biometrics are often unique, measurable or automatically recognizable and validated or permanent [1].
The finger vein traits are used for biometric recognition because of the several advantages compared to other modality. First, it is high security: the vein structure is hidden inside the skin and the possibility of spoofing the human recognition system is very low. Second, it is simple and easy to use: easily acquired using sensor capable of capturing the NIR (Near-Infrared) light source. Furthermore, each person has a unique and different vein pattern [2].
Face recognition is based on human facial characteristic for verification or identification of the person. The systems using facial recognition are sensitive to variance in facial expressions and accessories, uncontrolled illumination, and poses. Face recognition system has a natural place in the smart environments. This precedent system does not restrict the user's movement and is able to identify the person at a distance without demanding a present interaction. The individuals identify naturally the others by their faces, thus they are to be comfortable with systems that use face recognition. As a result, human and computer performance on facial recognition is a study topic with both scientific research value and wide application prospects.
The multi-biometric recognition system combines a variety of biometric sources. The main advantage of multimodal system against traditional single biometric is achieving more secure and accurate recognition process [3]. In this regard, researches of multimodal biometric using fingervein and face images are prevalent and essential recently.
Classification of relevant features from an input biometric trait image is the main key to improving the performance of a recognition system. The two most frequently used methods are Support Vector Machine (SVM) and Random Forest (RF). SVM model is a supervised machine learning method which is used for both classification and regression problem. This approach is able to use the huge number of features and a very large number of training dataset [4]. RF is a supervised learning algorithm to produce a predictive model and it needs less overfitting to a particular dataset than simple trees (higher stability) [5]. In recent years, the convolutional neural network (CNN) has shown better performance compared to traditional methods in various recognition problems. This model is able to learn automatically the most relevant features, which accounts it one of the most robust systems in terms of scaling, distortions, and translation. Yet the only disadvantage of automatic learning feature based methods against classical methods is that they demand more time in the training stage to find the most pertinent features [6].
Many techniques of the multimodal biometrics system have been proposed. Ross, et al. [7] presented different levels of fusion and score level fusion on the multimodal biometric system. Singh, et al.
[8] proposed biometric recognition system based on combining the face visible and thermal Infrared (IR) images at sensor level. Son [9] have been subjected a fusion of face and iris. Ross, et al. [10] presented hand and face combined at feature level. Moreover, the experiments were applied in three different scenarios. Different fusion techniques and normalization methods of fingerprint, hand geometry and face biometric sources are achieved by Jain, et al. [11]. Another multimodal biometric system based on multi-instance iris recognition system using a fusion of right iris and left iris for the same individual is studied by Wang et al. [12]. Yang, et al. [13] presented a cancelable multibiometric system using fingerprint and finger-vein, which combines the minutia points of fingerprint and finger-vein image feature based on a feature-level of three fusion techniques. Thus, Fischer Score and Linear Discriminate Analysis is used to reduce the biometric feature space. The fusion multimodal biometric system based on iris and periocular regions through a weighted concatenation and CNNs model on mobile devices is developed by Zhang, et al. [14].There exist only several works about a multimodal biometric system that includes finger vein and face traits. Muhammad Imran Razzak, et al.
[15] reported a fusion of finger-vein and face for authentication using Fuzzy score level fusion. In addition, they offered a multimodal biometric system with the low resolution face and finger veins at score level fusion [16]. Manjunathswamy, et al.
[17] also focused on such a multimodal biometric system based on Gabor Kernels method.
The general structure of classical recognition biometric system consists of four mainly stages: the acquisition of biometric image, the image preprocessing, feature extraction and image matching recognition. In these cases, the researchers must experimentally decide on an optimal and effective algorithm for all stages in order to increase the biometric recognition accuracy. To address this problem, the proposed method deploys the multimodal biometric recognition system that combines the finger-vein and face images using Convolutional Neural Network (CNN) architectures and classifiers based on Random Forest (RF) and Linear Support Vector Machine (SVM). Our scheme is efficient to various environmental changes and database types. Fig. 1 describes general block diagram of the proposed recognition system. The rest of the paper is divided into three sections. In the section 1, the proposed algorithm is discussed. Experimental results have been analyzed and discussed in Section 2. Finally, the conclusion is presented in the last section.

FINGER VEIN RECOGNITION SYSTEM
This section describes in detail the proposed finger vein recognition system using CNN. In this work, our proposed method includes the following three major stages: (1) preprocessing the finger vein image; (2) feature extraction with CNN model; (3) using Random Forest as a classifier for finger vein classification.
For preprocessing step, Canny method [18] is adopted to extract the ROI of finger vein image. Then, the inner rectangle is used to obtain the ROI. After that, the Adaptive Histogram Equalization (AHE) [19] technique improved the contrast of the image by limiting the contrast amplification in the different region of the image. The resulting finger vein image using Canny edge detector and AHE technique is shown in Fig. 2. After this step, the features are extracted from the preprocessing finger vein image using the CNN architecture Finally, Random Forest classifier [20] is proposed for classification.

b) Cropped Finger vein image and (c) AHE enhanced finger vein image from the Avera databases
Convolutional Neural Network (CNN) [21] is a multilayer perceptron (MLP) network based on deep supervised learning model. In this regard, CNN can be viewed as an automatic feature extractor and a trainable classifier. The configuration details of the proposed CNN architecture are shown in Fig. 3. The proposed model has 5 convolutional layers and 3 maxpooling layers which can be computed using Eq 1. In addition, 3 Rectified Linear Unit (ReLU) are used to our system. ReLU function is defined as Eq 2.
where O is the output map, x is input map, f is the filter and N is a number of elements in x.
where x is the input to a neuron. The structure is described as follows : 1) L1: the input layer data size of 58×150, which is the size of input preprocessing finger veins images; 2) L1M1: first hidden layer, composed of 32 convolutional filter of size 3×3×1, ReLU activation function and a max-pooling layer of size 2×2. This layer changes the input data into CL1M1= [29×75×32] features; 3) L2M2: second hidden layer, composed of 64 convolutional filter of size 3×3×32, ReLU activation function and a max-pooling layer of size 2×2. This layer changes the input data into CL2M2=[13×36×64] features; 4) L3M3: third hidden

RANDOM FOREST MODEL
The Random forest algorithm proposed by Breiman [17], is an ensemble learning technique for regression and classification. RF is basically combining of decision trees that are aggregated (bagging) to achieve a classification process, to choose significant variables, and to compute the relative importance of each variable in order to grow many decision trees on random subsets. Each tree in the RF selects a bootstrap sample of the input data set according to about two-third of the training datasets. The rest of the data (out-ofbag (OOB)) is left out by this model. The maximum size without splitting assures that each tree is grown to maximum depth based on lower classification model. However, the higher variance of the classifier is still remained. Random forest grows a decision tree while each last node includes unique elements of one class. The RF algorithm is presented as algorithm 1.

FACE RECOGNITION SYSTEM
In this section, the proposed algorithm for face recognition using CNN as a feature extractor is described. Our proposed method consists of two phases: feature extraction based on CNN and employing SVM as a classifier for face classification. Table 1 shows the configuration details of the proposed CNN architecture using face image. The proposed model has 5 convolutional layers where 3 are followed by max-pooling and 3 by Rectified Linear Unit (ReLU). The SVM classifier is used to predict labels of the input patterns. In order to break the connections between the first layer and the next layers the dropout probability [22] of 20% is adopted. In addition, the dropout probability of 10% between the second layer and the next layers is adopted. Table 1 summarizes the characteristics of the proposed face-CNN configuration.

SVM CLASSIFICATION
One of more well-known machine learning algorithms is Support Vector Machines (SVM), developed by Vapnik [23]. The basic idea of this model is to transform a nonlinear into a linear separable problem projecting given dataset into the feature set and then generating an optimum separating hyperplane.This is defined as Eq. 3.
where the feature x ∈ ℝ and SVM model learns the parameters w by solving an optimization problem as Eq.4 [24]. min ∥ w ∥ + C ∑ max (0,1-y' (w x +b)) , (4) where C is an arbitrary value or a selected value using hyper-parameter tuning (the penalty parameter), b is a scalar, y′ is the actual label, and w T x + b is the predictor function. ∥w∥ 2 is the Euclidean norm with the squared hinge loss. For Kernel SVM models, optimization must be performed in the dual. Therefore, scalability is a problem with Kernel SVM models, and in our work, we will be only using linear SVM.

FEATURE EXTRACTION FUSION
In this part, we introduce our proposed method as score level fusion technique using the matching score level fusion. This matching score indicates better proximity of characteristic vector with the template. The reason for the use of score fusion to claim person identity is to harness the power of the score level fusion to obtain an accurate assessment of the person recognition, which a single score may not be able to provide.
The fused score is based on the weighted score level as given using Eq 5. If the fused score value providing of the query finger vein and face recognition is greater than or equal to the decision threshold value, then the person is accepted, otherwise is rejected.

DATA AUGMENTATION
Data augmentation is one of the methods for reducing the effects of overfitting problems in CNN architecture. This technique is employed to increase the amount of training data based on image translation, rotation and cropping process. Data-augmentation method has been successfully used before in many researches. We implemented the data augmentation as expansion to the work in [25] such as the rotation and the translation (left, right, up and down) [26]. For Finger vein dataset augmentation, we augmented a database that is 124 times larger than the original one. For Face dataset augmentation, we obtained a database that is 10 times larger than the original one.

RESULTS AND DISCUSSION
The experimental operation platform in this study is described as follows: the host configuration: Intel Core i7 − 4770 processor, 8Go RAM and NVIDIA GeForce GTX 980 4GO GPU, runtime environment: Ubuntu 14.04 LTS (64 bit). In order to better verify our algorithm, the following classification methods are adopted in the experiment: Support Vector Machine (SVM) [23], Random Forest (RF) [20], Logistic Regression (LR) [27] and Linear Discriminant Analysis (LDA) [28]. In order to validate the proposed algorithm, CSLDA [16], MPAFFI [17], proposed fingervein-CNN and face-CNN algorithms were compared. The results have been tested on the public on the VERA Fingervein [29] (classified on two datasets namely: DB1 and DB2), AR Face [30] (classified on DB1 dataset) and Color Feret Face [31] database (classified on DB2 dataset). The total number of training images was 39680 and we divided them into training, validation, and test sets. The divided data set used in the experiment is shown in Table2. The performance measure is accuracy rate as defined by Equation (6).
where TP (True Positive Rate) is the probability of authorized users that are recognized correctly from the total number tested, TN (True Negative Rate) is the probability of authorized users that are not recognized from the total number tested. FP (False Positive Rate) describes the percentage of unauthorized users that are recognized from the total number tested. FN (False Negative Rate) describes the percentage of unauthorized users that are not recognized falsely from the total number tested.
As it can be seen in Table 2, Face recognition system using CNN with dropout method leads to a significant performance improvement compared to the two databases. In particular, the highest accuracy gain was obtained by dropout method on two dataset for training test and on DB1 for test set where its value gives 98,47% and it gives 97,25% when using Face-CNN without DropOut. Moreover, the least loss is obtained on two databases using the dropout method. Table 3 shows that the proposed algorithm of pre-processed fingervein with AHE based on CNN and RF classifier achieved higher accuracy comparing with other image type and classifiers techniques. It attained the accuracy of 99,77% and 99,89% on DB1 and DB2 dataset respectively.
Based on the results yielded in Table 4 it can be argued that the identification results of face based on CNN and SVM classifier are better than previous classifier on different datasets. It can be noted according to the table that, accuracy average is 97,51% using Softmax classifier, 97,80% using SVM classifier, 89,49% using LR classifier and 86,08% using RF classifier. Table 6 shows the performance of accuracy rate based on different algorithms over finger vein and face images. The proposed CNN-based identification system systematically outperforms CSLDA [16] and MPAFFI [17] approaches. In comparison with single biometric system especially for face recognition, our proposed algorithm with multimodal biometric system based on the fusion of finger vein and face images using (CNN) and different classifiers shows superior performance in terms of accuracy rate with 99,77% on DB1 dataset and 99,89% on DB2 where face recognition using CNN and SVM classifier gives 97,83% on DB1 dataset and 97,78% on DB2 dataset. Finally, we can conclude from these results that the proposed multimodal system is superior to other methods because: 1) The enhanced finger vein patterns using Adaptive Histogram Equalization (AHE) are significantly clearly distinguishable and more prominent in their others enhanced versions. Therefore, the proposed fingervein-CNN is typically able to guarantee a high identification rate using AHE technique.
2) The recognition accuracy using dropout method is better than using the dataset without using this method.
3) CNN approach can usually provide better performances than using combinations between different processes such as windowing, extracting features, etc. Thus, the recognition biometric system based on CNN technique can surpass other classical and complicated methods.
4) The proposed multimodal algorithm has higher accuracy to identify the person and ensure that its information or data is safer compared to system based on single biometrics modality.

CONCLUSION
This paper presents a multimodal biometric identification system using CNN to fuse the finger vein and the face based on score level of fusion. For the fingerVein system, the pre-processed image using Adaptive Histogram Equalization (AHE) is the input into a CNN model. Therefore, a tree-layered CNN with fusion of convolution and Random Forest (RF) classifier was proposed through the 32-64-128 model. For the Face system, a tree-layered CNN with fusion of convolution and Support Vector Machine (SVM) classifier was employed through the 32-64-128 model. Moreover, dropout method plays an important role for increasing the identification accuracy. The features of the bimodal biometric are fused scores level based on the weighted score. The experimental result on four datasets indicates that the overall performance of the proposed multiple biometric algorithm for all databases is better than unimodal algorithms based on CNN.