New Modelling of Modified Two Dimensional Fisherface Based Feature Extraction

Biometric researches have been interesting field for many researches included facial recognition. Crucial process of facial recognition is feature extraction. One Dimensional Linear Discriminant Analysis is one of feature extraction method is development of Principal Component Analysis mostly used by researches. But, it has limitation, it can efficiently work when number of training sets greater or equal than number of dimensions of image training set. This limitation has been overcome by using Two Dimensional Linear Discriminant Analysis. However, search value of matrix identity R and L by using Two Dimensional Linear Discriminant Analysis takes high cost, which is O(n 3 ). In this research, the seeking of “Scatter between Class” and “Scatter within Class” by using Discriminant Analysis without having to find the value of R and L advance are proposed. Time complexity of proposed method is O(n 2 ). Proposed method has been tested by using AT&T face image database. The experimental results show that maximum recognition rate of proposed method is 100%.


Introduction
The biometrics survey results show that facial recognition has big prospect to be developed in advanced. Facial recognition technology has also applied in many sectors, such as banking, government, military and even industrial [1]. Low level feature extraction such as appearance method is the most popular method to extract feature in facial recognition, which is Principal Component Analysis (PCA) [2], [3]. It is simple method to extract facial features by dimensional reduction. This method has been cited by many researches. However, it has limitations. The first, dimensional reduction could not be conducted, when the number of training sets more or equal than image dimension. The last, separation classes could not be efficiently conducted.
Some appearance methods have been proposed to overcome it, which are Linear Discriminant Analysis (LDA) [4], [5], [6], Linear Preserving Projection (LPP) well known as Laplacianfaces [7], and Orthogonal Neighborhood Preserving Projection (ONPP) [8]. These methods have been overcome the last problem of PCA. However, the first problem could not be overcome when number of training set samples more than image dimensionality.
In order to overcome the last problem of PCA, two dimensionality appearance methods has been proposed, such as Two Dimensional -Principal Component Analysis (2D-PCA) [9], [10] and Two Dimensional -Linear Discriminant Analysis (2D-LDA) [11], [12], [13], [14]. Feature extraction without transformation employ on image samples was first proposed by J. Yang [10]. Besides 2D-LDA, further development of 2D-PCA was also proposed by J. Ye, this is Generalized Low Rank Approximations of Matrices (GLRAM) [15]. 2D-LDA was iteratively implemented to find the right (R) and the left (L) sides of the matrix to achieve the optimal matrix projection. Unfortunately, it needs high cost to obtain the L and R values as feature extraction. In this research, modification of 2D-LDA is proposed to obtain the optimal matrix projection, without finding the value of L and R.

One Dimensioanl Appearance Method
The most popular of appearance feature extraction method is Principal Component Analysis, well known as PCA [2], [3]. PCA implemented on face is also well known as eigenfaces. Suppose an image is represented by using f(x, y), for each data input f(x,y) is In this case, image weight and height are represented by using h and w. The result of transformation is one dimensional matrix form, which has size (1,n) or (n,1). The symbol of n represents the result of multiplication between h and w (n=h*w). If number of face images used as training sets is k, then training sets matrix can be written by using PCA method has been developed by many researchers such as Linear Discriminant Analysis (LDA) is well known as Fisherface. The feature of Fisherface can be obtained by using maximized of scatter between class (S b ) and minimized scatter within class (S w ) as seen on eqauation The results of Equation (3) and (4) can be utilized to compute the Eigen value and vector as seen follows Eigen value achieved by Equation (5) must be ordered decreasingly ( 1  2  2  … k ) and followed by Eigen vector (). The dimension of Eigen vector () is fea, where fea is defined as k-c.
However, LDA has crucial problem, it will fail to maximized scatter between class and minimized scatter within class when the discriminatory information is not in the mean of the training sets but rather in the variance of the training sets. If this condition has occurred, than LDA could not efficiently work.

Two Dimensional Appearance Method
Basically, problem occurred on LDA can be overcome by using Two Dimensional Discriminant Analysis (2D-LDA) (H. Kong The initialization value of R can be defined as identity matrix that has k dimension, where k represents number of eigenfaces used. The results of Equations (20) and (21) can be used to compute the covariance as seen The result of Equation (8) can be utilized to calculate the eigenvector. These eigenvectors are used as initialization value of L And the value of C L can be computed by using The value of R can be updated by using the eigenvector of Equation (11) L m The values of L and R are used to achieve the projection matrix. However this method has limitation, which is the value of L and R depend on the number of iterations. Time complexity to achieve feature extraction (L and R) is O(n 3 ).

Proposed Method
Main idea of proposed method is modify of 2D-LDA. The training set is not converting into one dimensional vector. For each class is computed the average of the training set and for all classes are also computed the average of the training sets. The covariance of training set can be computed by using the following equation  The eigenvector of Equation (13) is used to achieve the weight of the testing sets. The results of Equation (17) is measured the similarity of the weight between the training and testing sets. Feature extractions resulted of proposed method is two dimensional matrixes, the results of feature extraction of training sets has same size with original image. To achieve high recognition rate, it is necessary to chose dominant feature. The most dominant feature has correlation to the largest eigenvalue. If number of feature chosen is d, then number of vector element is d*h, where h represents image height.
To find out of the testing sets class, it is necessary to measure the weight between the training and the testing set. In this research, four methods were used to measure the similarity, which are Euclidian Distance (D 1 ) and Manhattan (D 2 ) as seen in the following equation The final decision of the similarity measurements is the smallest value of the result for each equation.

Experimental Results and Analysis
In this research, 400 images have been used as data experiment, for both as the training and the testing sets. These images are made by AT&T -Cambridge University Computer Laboratory, it founded in 1986 as the Olivetti Research Laboratory, better known as ORL. Images are made from 40 people. For each people has been taken 10 times with different poses.

Figure 1. Samples of ORL Face Database
The proposed method has been proved with the facial image reconstruction from ORL face image database as seen in Figure 1. In this paper, 30 features have been used to reconstruct facial image. In Figure 2 on the first row image, from the left to the right describes facial image reconstruction using 1 until 10 features. Facial image reconstruction using 11 until 20 features is described in Figure 2 on the second row image from the left to the right. Similarly on the third row image, facial image reconstruction used 21 until 30 features.
Based on facial image reconstruction, the greater features used, the better facial image reconstruction yielded as seen in Fgure 2. The other hand, it can be said that the greater features used, the smaller error yielded as seen in Figure 3. In order to evaluate robustness of proposed method, experiments were measured with 2 different similarity measurements, which are Euclidian Distance and Manhattan. For each similarity measurements utilized 2, 3, 4, 5, 6 and 7 training sets as seen in Table 1.

Table 1. Experimental Model for each Similarity Measurements Method
For each training sets, number of features used are 2, 3, 4, 5, 6, 7, 8, 9, and 10 features as dominant characteristics respectively. The first scenario, Euclidian distance method is used to measure the similarity of the feature extraction. The last scenario the results of feature extraction were measured the similarity by using Manhattan method. The experimental results show that the number of training sets influence recognition rate. Based on Figure 4, it can be shown that the minimum recognition rate for 2 features obtained 84.69%, whereas the maximum recognition rate for 2 features achieved 100%. The number of training sets has influence to recognition rate achieved. The more training sets used, the more recognition rate achieved. Similarly, it can be also seen for all features, except experimental results using 4 features. The recognition rate for experimental using 6 training sets is higher than experimental using 7 training sets. The number of features used to measure similarity does not have influence significant to recognition rate. It can be seen in Figure 4. The same phenomena is also occurred, when the similarity measurements using Manhattan as seen in Figure 5. Deviation of the experimental results occurred, when experimental uses 4 and 5 training sets. The experimental results show that recognition rate of 6 training sets is higher than recognition rate of 7 training sets as seen in Figure 5. The mistake of recognition occurred, because of accessories usage on the testing set. The highest recognition rate occurred, when 2 features is used as parameter for similarity measurement. The experimental results of proposed method have been also compared to other methods, which are PCA, LDA, LPP, Orthogonal Laplacianfaces, Feature Fusion [17], 2D-PCA [16] and 2D-LDA [16]. The Comparison results show that, recognition rate for 5 training sets on the first scenario is 96%, it is superior to other methods expect to 2D-PCA and 2D-LDA. But recognition rate for the second scenario is superior to other method except to 2D-LDA. In this case, maximum recognition rate of our proposed method is 97%, whereas recognition rate of 2D-LDA is 97.33%. For 6 and 7 training set, proposed method is superior to other methods for both the first and the second scenario as seen in Table 2. Proposed method cannot select the best features for each column yet, so increasing features used has given bad effect to recognition results.

Conclusions and Future Research
Based on experimental results and analysis, the highest recognition rate occurred when two vector features used as parameter for similarity measurements, for both the first and the second scenarios. It shows that the more features used, the lower recognition rate achieved. Decreasing of recognition rate is caused of the usage non dominant features. The more number of training sets used, the higher recognition rate achieved. The highest recognition rate is 100%. It occurred when experiment uses 7 training sets on the first scenario, and 6 and 7 training sets on the second scenario.
Proposed method will be developed to improve their limitations, when number of features used more than 2 features. It is necessary to select the dominant feature vectors for each column. It is conducted to increase the recognition rate and to decrease computation time for similarity measurement.