Matching algorithm performance analysis for autocalibration method of stereo vision

ABSTRACT


INTRODUCTION
Vision-based measurement has been one of the most interesting research topics in the last decades.Many applications have been developed using vision-based measurement [1].The two major methods of 3D measurement can be categorized into active and passive methods.Structured illumination or laser is used in the active measurement.This method is not applicable in many cases.The passive 3D measurement is based on stereo vision and provides more advantages than active measurement.It requires simpler instrumentation, offering higher applicability in many environments.However, the major issue for passive measurement is the difficulty in finding accurate correspondence between stereo images [2].
Stereo calibration is the most important step to find a correspondence point.Camera calibration is required to ensure that both cameras are in perfect position and to remove distortion.Traditionally, camera calibration is performed using the standard chess-board picture [3].However, much work is required in the self-calibration methods.Stereo self-calibration refers to the automatic determination of stereo camera parameters from image sequences.
Self-calibration is an important ability required for the introduction of stereo cameras into the market.Many works have been published with this method [4][5][6][7][8][9][10][11].It can guarantee maintenance-free and the long-term operation, as the environmental conditions may change the camera position.Special expertise is required to do the offline calibration.Self-calibration may reduce regular offline calibration time Even if human eyes have different characteristics with minus/plus/cylindrical properties, the human brain can automatically adjust.Consequently, the human being will have no difficulties in merging two visions from the left and right cameras.In designing a self-calibration method, a matching algorithm is an important tool to find a correspondence point between images of two cameras.
The main objective of this paper is to analyze the performance of three matching algorithms for the autocalibration process.Two of the most common techniques for stereo correspondence are the sum of absolute differences (SAD) and the sum of squared differences (SSD).The corresponding points between images have been obtained by minimizing SAD or SSD in area-based block matching [12].However, these two techniques result in low accuracy as their major drawback.An improvement by using sub-pixel block matching techniques has been explored in [4], but the obtained accuracy was still not enough.Recently, there have been many algorithms proposed on image matching using various techniques [13].In this work, a set of experiments demonstrates that the stereo vision system employing the proposed technique can measure 3D surfaces of free-form objects with sub-mm accuracy.Three matching techniques used in this research are SIFT, SURF, and ORB.The matching algorithm provides the characteristics of each camera [14].It used to transform the second image to perform automatic stereo calibration.The explanation of each algorithm is explained as follows.
-SIFT Scale invariant feature transform (SIFT) is a matching algorithm proposed by Lowe [15].This algorithm works very well in finding a correspondence point of the image which is rotated and transformed.This algorithm consists of four steps.The first step is the estimation of scale-space extrema using the Difference of Gaussian method, being express using (1) and described in Figure 1.
(1) Figure 1.The estimation of scale-space extrema using Difference of Gaussian method In the next step, the key point candidates are refined by the elimination of low value.Laplacian of Gaussian σ2∇2G is used since it produces the most stable image feature than others.The correlation between the Difference of Gaussian and the Laplacian of gaussian can be expressed using ( 2) and ( 3).
(2) The key point orientation is assigned by using an image gradient.The final step is the computation of the local image descriptor based on the gradient and orientation of the key point.Because of its algorithm complexity, SIFT requires a large computational capacity, even though it is very suitable for object recognition applications [16,17].
-SURF Speed up robust feature (SURF) technique performs faster than SIFT [18].In some cases, it performs with equal quality to SIFT.SURF technique is based on a descriptor and a detector, which is equal to SIFT.Instead of using the gaussian average of the images, SURF uses squares for approximation.It employs the Hessian matrix-based Blob detector to find the point of interest.Wavelet response is used for orientation assignment by applying gaussian weight.SURF feature descriptor is generated by the wavelet response of the subregion.The subregion is the division of the neighbor around the key point.Two points will form a correspondence (match) if they the same contrast, generated from Laplacian.
-ORB Oriented FAST and rotated BRIEF (ORB) has been proposed by Rublee, et al. [19].It is another alternative for SIFT.ORB is a combination of the FAST key point and the BRIEF descriptor.The FAST is used to determine the key point [20].In the next step, Harris corner is used to find the top N point.FAST computes the intensity-weighted centroid, located at the center.The orientation is obtained by the vector direction to the centroid.

RESEARCH METHOD
The purpose of this research is to find the best algorithm for the auto-calibration of stereo vision.The first step of calibration is the finding of the corresponding points between two images.The accuracy of this step determines the accuracy of stereo vision.The object of this research is a microscopic object with the size of a few millimeters.The disparity of the points is converted into the intrinsic parameter of the camera.
The method used in this research is described in Figure 2. The stereo image has been produced using two cameras.In order to handle the very narrow view area caused by the small-size objects, the converged camera setup is used.It is hard to put objects in the overlapped area if parallel cameras are used.The histogram equalization steps are required since the illumination of each image or camera color character possibly different [21].To reduce the noises, the combination of Gaussian and medium filter applied.Both filters are proposed to improve the image quality [22].Gaussian can be expressed in (4).While the median filter expressed in (5).A combination of both filter expressed using (6).
The result of histogram equalization is processed using a feature extraction algorithm.Three feature extraction algorithms are used to find the match correspondence point on each set of images [20].The match correspondence point used for the rectification process [23].Distance between each corresponding point is used to extract the stereo parameter.The output of this method is the stereo calibration parameter [24].The result of this process can be transformed into a 3D surface.Two industrial standard HD camera is used in this research.These cameras are equipped with a 100x lens to enlarge the object size.Two captured images from both cameras are then compared and evaluated using the matching algorithm to find the corresponding point.Figure 3 shows the camera setup and the object size.

RESULTS AND ANALYSIS
The execution of SIFT, SURF, and ORB on each pair of image sets has been performed to find the best method for image matching.In the obtained results, the green line indicates the correspondence point between the left and right images.The number of connected lines shows the number of matched points.How ever, each algorithm still resulted in an error if the algorithm failed to match the correct points.The result of this matching is used to generate the calibration parameter of stereo vision.

Matching results using SIFT, SURF, and ORB
The result of implementing the SIFT, SURF and ORB algorithms on the captured object are given respectively in Figures 5 (a-c).As seen in the image set 1 and 5 of Figure 5 (a), only a few lines have been generated by the SIFT algorithm.The background has very high similarities between images.The result of SURF algorithm implementation given in Figure 5 (b) indicates that on the image set 1 there have been only a few lines generated by an algorithm and some lines indicated a major error.The rest of the image sets shows the correct corresponding points.The result of implementing the ORB algorithm shown in Figure 5 (c) also indicates that there have been only a few lines generated by the algorithm on the image set 1, with some lines indicated major error.The four other image sets indicated the correct corresponding points.
The comparison of the matching results using SIFT, SURF, and ORB techniques is presented in Table 1.It indicates matching accuracy of the three algorithms SIFT, SURF, and ORB.It can be known from the table that the SIFT algorithm gives the highest average percentage accuracy.However, the percentage of correct lines varies depending on the image characteristics.For the image with high similarities, SURF failed to give a good result, whereas ORB could generate many lines, but with high error rates.The comparison of the matching results using SIFT, SURF, and ORB techniques is presented in Table 1.It indicates matching accuracy of the three algorithms SIFT, SURF, and ORB.As can be seen in the table, the SIFT algorithm gives the highest average accuracy percentage.However, the percentage of correct lines varies depending on the image characteristics.For the image with high similarities, SURF failed to give a good result, whereas ORB could generate many lines, but with high error rates.
The result in Table 1 compared with the result from Karami et.all [13] with the case of varying intensity shown in Table 2.It shows that in both works, SIFT performs better than other methods.Table 3 shows the comparison of the computational time of each algorithm.It shows that the SIFT method required a longer time than the others due to the complex algorithm computation.SIFT required a longer time when the image had high similarities in its texture.Figure 6 indicates that the ORB algorithm has the fastest computation time for all images sets.It takes less than 0.5s processing time.However, the ORB algorithm

Image rectification
The matching point from previous steps is used for rectifying the images.The difference position between source and destination point used as a reference for transformation.Figures 7a and 7b

3D surface generation
The matching process results in the distance between points.Using the distance values, a 3D surface object can be generated by projecting them onto the z-axis [26,27].Distance value between both images assigned as the depth value.If the distance is small, the object is closer to the camera, and vice versa.Depth value for each pixel than converted to grayscale to distinguish the depth of point.Figure 8 shows the generated disparity map of the dataset using SIFT Adjustment.Correlated point produces by SIFT is used to calculate the stereo camera parameters.The result shows that the algorithm successfully generates match stereo, however, the noisy output is a bit challenging.Using the depth value as z-axis produce 3d view as shown in Figure 9 the algorithm successfully produces 3D reconstruction, but the noises reduce image quality.

CONCLUSION
In this paper, three different image matching techniques, SIFT, SURF, and ORB, for stereo autocalibration system have been compared.SIFT indicates the best performance in most scenarios under consideration.In the special case, when the images contain multiple high similarities texture, SURF failed to give good results.In the ORB implementation, the features are mostly concentrated in objects at the center of the image.While SIFT and SURF, the features are distributed over the image.The 3D reconstruction image has successfully generated, but the noise reduces the quality of the images.For future work, a good filtering algorithm required for a better result, without scarifying the details of images.


Matching algorithm performance analysis for autocalibration method of stereo (Raden AriefSetyawan)    1107

Figure 2 .
Figure 2. The distance measurement procedure

Figure 3 .
Figure 3.The cameras set-up and the object size

Figure 4 .
Figure 4. (a) The dimension of the object and (b) the datasets used in the research


ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 2, April 2020: 1105 -1112 1110 gives less matching rates compared with other methods.The line chart in Figure 6 also indicates that the complexity of the images linear with the computation time.Image set 1 and 5 give the longest computation time than other images set because of their complexity.

Figure 6 .
Figure 6.Computational time comparison chart show a distorted image from left and right camera.
Figure 7a used as the reference, while the Figure 7b is the object of transformation.The result of the image transformation of Figure 7b displayed in Figure 7c.This transformation based on the homography equation to reduce distortion [25].

Figure 7 .
Figure 7. Rectification result (a) left Image as reference (b) right image (c) the result of rectification

Figure 8 .Figure 9 .
Figure 8. Generated depth value based on SIFT matching algorithm

Table 1 .
Comparison of the matching results using the SIFT, SURF, and ORB techniques

Table 2 .
Comparison of the matching results between Karami and this work

Table 3 .
Computational time using the SIFT, SURF, and ORB techniques