Road and Vehicles Detection System Using HSV Color Space for Autonomous Vehicle

ABSTRACT


INTRODUCTION
The rapid growth of technology encourages all the tools to utilize the control system in their operation, including autonomous vehicles. In such a control system, set points and various inputs, such as images, numbers, and videos, are key factors for it to work. These inputs, especially images and videos, require an image processing to provide the right signal on the controller. A camera is usually used as a sensor to capture images or video and may capture certain objects inside them.
Video has been utilized to develop safety features in an autonomous vehicle system [1]. Various studies have been conducted to detect road, including lane detection and edge detection, using videos. Lane and edge detection are important to realize an ideal autonomous vehicle navigation system. Some studies used the Lane Departure Warning System (LDWS) [1], [2], [3]. In addition to road detection, a navigation system is required in the autonomous vehicle for its driving movement. This navigation system usually uses LIDAR, GPS, and camera sensors, as well as waypoints [4], [5], [6].
As one of the sensors in the autonomous vehicle, the camera can be useful to detect the lane or road. Besides, this sensor is also used to estimate the distances of the objects around the vehicle. However, the camera sensor has a weakness in identifying color. Therefore, precision color space, such as Hue, Saturation, and Value (HSV) color space, is needed to overcome it.
HSV has shown some potential in various applications, such as road signs detection [7], [8]. Besides, the HSV method has been utilized to detect a moving object in combination with a Gaussian Mixture Model (GMM) [9]. In lane detection, X. Shi et al. [10] used HSV color space to detect lanes in three stages based on feature patterns. The results of his research showed that three steps, including detect all the features, select the The range of HSV color space used in this research is shown in Figure 1. The color space is quantized from 0 to 255 when the image changes from RGB space to HSV space. The conversion formula for the HSV color space is [16] The Saturation calculation: And the Value calculation:

Region of Interest (ROI)
Region of Interest (ROI) is one of the segmentation techniques in image processing, aims to focus the image processing on a particular purposed area. ROI is processed before the image processing, so the image processing is only carried out on the ROI frame and not on the entire frame. Thus, ROI can be used to limit the observation area of a vehicle, and the ROI area cannot be made for a specific area because it is rather complex to adjust to other objects or other applications. Determination of ROI is helpful in minimizing errors in road detection since unnecessary information may cause noise [17].

2.3.
Haar-Like Feature The Haar-like feature was first proposed by Viola P. and Jones M. to detect human faces. In subsequent research, the Haar-like feature method was re-updated by Lienhart R. and Maydt J. The Haar-like feature is a classifier trained using the data from some of the sample images of an object. That collection of images produces a collection of object features, called cascade. When working on objects recognition, cascade rejects certain image area which does not have objects that meet the criteria [18].
In a vehicle detection, the Haar-like Feature uses an XML file that stores data of vehicle types, such as cars and motorcycles. Those vehicle images are extracted into an XML file, so it will be easy to be used in programming. This research used the XML file created from previous research.

Confusion Matrix
The confusion matrix is a method to calculate accuracy. There are four terms in measuring the confusion matrix performance, namely: True Positive, True Negative, False Positive, and False Negative. True Positive (TP) is the positive data detected with true value. True Negative (TN) is the total of negative data detected with an incorrect value. False Positive (FP) is the negative data but detected as positive data. False Negative (FN) is the positive data but detected as negative data. TP, TN, FP, and FN as the results of the classification process, can be seen in Table 1.

Measurement of Road Asphalt Area and Object Distance
The measurement of road area and object distance is used to maximize the driver's safety rate. First, in the road area measurement, the measured road area whose data is obtained is the right-side and left-side road of the autonomous vehicle. This measurement is performed by calculating the pixel area and comparing the pixel area with the actual detected road asphalt area to determine the road safety rate, whether it is a safe or dangerous distance. While for the object distance measurement, the pixel area measurement of detected vehicles is used to find out the object distance; the closer, the bigger the detected pixel area, and the smaller the measured distance will be. The equation used in these two calculations measurement is: where x is the value obtained after calculating the ratio between area in pixel and the actual area. Meanwhile, (9) is used to calculate the vehicle area detection accuracy and object distance measurement: 2.6. Road Detection System The road detection system proposed in this study can be seen in Figure 2. As shown in the figure, vehicle lane detection, i.e., asphalt, uses video-based camera sensors as an input. The input will be processed if the frame meets the requirements for further processing. These videos will then be processed using HSV color space. The asphalt road is the main image that must be separated from the background image based on different values of the HSV color space. If the processing using HSV color space is complete, then ROI is used to facilitate the work of the program in processing the area that we want to discover and use by limiting the working area. The next stage is to connect points or lines that have distance using the Hough Transform algorithm, so the appearance of the detected area becomes neater. This method has been successful in detecting highways combined with HSV [19]. The program will be done if the frontal road lane detects as asphalt. Meanwhile, safe/unsafe area measurement of the right/left camera is performed after the road area is detected using pixel calculation, which will be used as the input for processing the safe or unsafe vehicle position. If the entire process has been run, the program is complete.
The road data were taken from a video of the road around Universitas Sriwijaya. The data duration was between 60 to 90 seconds with a pixel size of 1920 × 2560 pixels for the right and left cameras, and 1080 × 1920 pixels for the front camera.

Object Detection System
A video-based camera sensor as input is used in object detection. The input will be processed if the frame meets the requirements for further processing. Videos that have met the criteria are first converted into gray images before performing the Haar-Like Cascade objects detection. In the Haar-Like Cascade method, object recognition is done by extracting the object that we want to recognize into the XML file; if it is recognized, a bounding box will be given to the object. In this research, XML files [20] were used. Videos were used not only to detect objects but also to detect vehicle distances. In this research, vehicle distance was done by utilizing the function of HSV color space that is used to determine the pixel area used as input in measuring object distances. Calibration must be done before measuring object distance by comparing the detected pixel value with the actual distance value.

Data Collecting
In this research, the data used were the secondary and primary data. The secondary data were some images of vehicles that might be recognized as objects, i.e., motorcycles and cars. While the primary data were road data from the predetermined area, as shown in Figure 3. The video data were extracted into frames in * JPG format and would be used as samples in image processing to detect roads, cars, motorcycles, and other vehicles around electric vehicles. After obtaining the data, three videos with the best lighting conditions were taken out. In addition, vehicle distance data were retrieved to calculate the accuracy and to discover the trajectory length that will be used as the research area. The image and video extractions can be seen in Figure 4. In the initial process of road and object detection around the electric vehicle, the frame must first be processed to produce the test data so it can run commands to detect the road and object in the video.  Fig. 4. Video extraction (a) As a road picture (b) As an image containing a vehicle (c) Road measurement and vehicle distance accuracy

RESULTS AND DISCUSSION
The data taken were in a video format of the streets within the campus area of Universitas Sriwijaya. The data were firstly extracted into images to get sample test images. The road data obtained from sample test images were then used as information in the video data processing. The process of road detection, i.e., asphalt was performed in the following steps: changing the color of the image into HSV color space, using masking images to separate the road color with the colors of other objects, limiting the working area using ROI, and adding Hough Transform to spruce up the detected road shape. Meanwhile, the safe area detection on the right or left side of the road and the object distance measurement were performed by using the pixel area information. In summary, the whole process can be seen in Figure 5. The HSV color space is used in the road detection process. The detection results using HSV color space compared to the ground truth (GT) can be seen in Table 3. The average accuracy of road detection by the front camera is 92,44%. This result indicates that the HSV color space is good enough to detect the road without lane-marking. However, HSV color space still has disturbances in road recognition caused by lighting and similarities of the road color with nearby objects (for example, between asphalt and cement). The results of road detection from the right side of the road with HSV color space and its ground truth can be seen in Table 4. The average accuracy obtained by the right camera to detect the road is 68.78%. The road detection results of the left side of the road asphalt with HSV color space is shown in Table 5. The average accuracy obtained on-road detection by the left camera is 75.83%. This accuracy is better than road detection from the right side. The results from the front, right and left camera showed that the HSV color space is quite reasonable to detect the road without lane-marking with an average accuracy of 79,02%. Despite these good results, HSV color space may still have errors caused by little color differences between two objects (for example, asphalt with cement), which makes both objects detected in the same HSV value. Besides, the lack of lighting may cause difficulty to recognize HSV objects, either in the form of shading or dark (evening or night).
The car used has a 2-meter width, and the road width is 4.8 meters (manual road measurement). With such road width, the car has taken almost half of the road area. The remaining 2.8 meters is the area used as the safe/unsafe indication of the autonomous vehicle position. The 60 cm is the minimum distance taken as the safe distance of the left side of the vehicle from the road, and 100 cm is the minimum distance taken as the safe distance of the left side of the vehicle. Through these data, the read pixel area from the right side distance and the road left side were compared to the actual distance measured by meter gauge to find linear equations, and the results of the road right side show the safe area if the read pixel area is more than 250,000 (1 m 2 area) and the road left side shows the safe area if the read pixel area is more than 200,000 (0.6 m 2 area). Table 6 and  Table 7 show the results from some samples images of the safe/unsafe area detection of the road right side and left side, respectively.  The road area detection process results in an accuracy of 80%, where 48 images of safe/unsafe areas were successfully measured correctly from the 60 images taken, and the rest 12 images were mismeasured. The error of measuring the safe/unsafe areas of 12 images was led by the unstable position of the camera caused by changes in the area of the asphalt measurement area, which made the standard of initial measurement not working properly. In addition, this error might be caused by the movement of the car when it took a turn. Nevertheless, the measurement error caused by it did not take effect since it only occurred for a few seconds. The results for object detection, i.e., vehicles in the front of the autonomous vehicles, can be seen in Table 8. In objection detection, 34 images were detected successfully from 40 images, while 6 other images were unsuccessfully detected, so the average accuracy of object detection was 85%. Some causes led to errors that occurred in object detection. The color of the vehicle might cause them, wherein this research, one of the vehicles used for object detection, had a green color that resembled the color of the video background. Thus, a closer distance between the camera and the object is required for the object to be detected. Besides, the lack of vehicle data type extracted from the XML feature file obtained from previous research may cause the error in object detection.
The results of object distance measurement in front of autonomous vehicles can be seen in Table 9. The accuracy was obtained by comparing the actual area with the area read by the program. The results of object distance measurement obtained a good value of 74.76%. In this study, the object distance measurement had only been applied to the minibus, using a linear equation formula that utilized information from the pixel area. The results were good enough since objects would be directly detected from their recognized color. Nevertheless, this measurement method has a weakness in the range of objects because vehicles (with a distance of fewer than 5 meters) might suffer a detection failure due to a set ROI limit, and measurements were interrupted due to vehicle colors modified.

CONCLUSION
This study showed that HSV color space could give a good performance in detecting the road without lanemarkings. The test results showed that the road asphalt was detected by HSV color space with an average accuracy of 79,02%. The process of detecting roads without lane-markings in the area of Universitas Sriwijaya using HSV color space is successful even though HSV color space still has disturbances. Besides, the area measurement that showed the safe/unsafe of the road right and left sides was detected well using HSV color space with an accuracy of 80%. This is sufficient in helping the navigation process by knowing the vehicle position if it is too far to the left or right. As for the object introduction using the XML file, 80% accuracy was obtained and worked quite well. The object recognition of vehicles can be improved by multiplying the vehicle data in the XML file format. Lastly, the object distance measurement by utilizing the pixel wide data gave an accuracy of 74.76%, which will be useful for the autonomous vehicle to measure its distance to other objects.
In the future, we will apply and perform a trial of this HSV color space method in the prototype of autonomous vehicles.