Region Based Image Retrieval Using Ratio of Proportional Overlapping Object

In Region Based Image Retrieval (RBIR), determination of the relevant block in query region is based on the percentage of image objects that overlap with each sub-block. But in some images, the sizes of relevant objects are small. It may cause the object to be ignored in determining the relevant sub-blocks. Therefore, in this study we proposed a system of RBIR based on the percentage of proportional objects that overlap with sub-blocks. Each sub-block is selected as a query region. The color and texture features of the query region will be extracted by using HSV histogram and Local Binary Pattern (LBP), respectively. We also used shape as global feature by applying invariant moment as descriptor. Experimental results show that the proposed method has average precision with 74%.


Introduction
In the last few decades, Content Based Image Retrieval (CBIR) has become popular research.CBIR is one of the techniques to retrieve image from large image databases based on their visual similarity [1].Searching images using their content has many advantages than using their annotation text, because not every images have annotation and not every annotations can resemble the images well.Therefore, CBIR is able to overcome the weakness of text-based image retrieval method.
Query by Example (QBE) is one of the query techniques in CBIR system that gives an example image as query.The features of query will be extracted and compared with features from image databases.The feature can be divided as global feature and local feature.Global feature uses overall image to extract their feature without consider the users interest [2,3].While the local feature only uses part of image that users need.Some of the researches find that using local feature based on their region is more effective to satisfy what users require [1], [4][5][6].Local feature extraction from query image based on region is known as Region Based Image Retrieval (RBIR).
In RBIR, some of the regions are created in query image.Not every region is relevant to determine the user interest.Therefore users have to define the Region of Interest (ROI) in image query so that the irrelevant region can be eliminated.ROI chosen by the user is more capable to express user interest, but it becomes less effective if users have to deal with a lot of query.Another approach is creating ROI by system [1, [4].The method divides image into subblocks in certain size (e.g.3x3) than determine ROI from every sub-blocks which is overlap with the object [1].The object is obtained from segmenting between foreground and background.Another research to automatically defined ROI is using Region Important Index (RII) and Saliency Region Overlapping Block (SROB) [4].However if the image content small object, sometimes the sub-block cannot be detected and become irrelevant sub-block.That can lead to define wrong ROI or selected region.Therefore it is necessary to have a method that can be adapted well in the different size of object.
In this paper, we proposed system RBIR based on the percentage of proportional objects that overlap with sub-block to determine selected region.To find the similarity, we use color and texture as local feature and shape as global feature.This method is expected to improve image relevancy compared to existing methods.

Research Method
The Wang's image datasets are used in this paper (can be downloaded in http://wang.ist.psu.edu/docs/related/).This dataset usually used to measure performance of image retrieval method.There are 1000 images, which consists of 10 categories.Each category has 100 images.Step by step of proposed method as we can see in Figure 1 will be described in the next subsection.

Preprocessing
The first step, image query is blurred by Gaussian filter to reduce noise.Then convert the result image to gray scale.In this research, segmentation is done by using the edge descriptor.The use of edge descriptor to simplify the process of segmentation has also been done in previous studies [1].Sobel filter is used to extract the proper edge of objects from the gray scale image.This stage produces black and white image with edges of objects, the result are shown in Figure 2.However, there are still any gaps between edges in the same object.To overcome this issue, dilation is used, so that the gap between edges can be reduced.To perform dilation, lineshape of structuring element which is 5 pixels wide is used.Due to space between edges that have vary in position, we use four kind of line structuring element with different angels, i.e. 0°, 45°, 90° and 135°.After dilation process, filling are perform to get fully segmented image.
Dilation process by applying 4 structuring element has side effect in object size.Compare to original object, the result has bigger object than the original image.Erosion using circle with size 3 as structuring element are performed to minimize those effects as shown in Figure 3.

Determine Proportional Overlapping Sub-blocks
To determine the region as the query, image query is divided into fixed size n x n.In this paper we use 3 x 3 as shown in Figure 4. From the previous research [4,6], the best size to divide images is 5 x 5 but has higher computation than 3 x 3.  Commonly, there is relevant and irrelevant sub-block in an image.The irrelevant subblock will be eliminated, because they do not have information to describe what user needs and can cause error in image retrieval.To determine whether the sub-block is relevant or not, our proposed method are used.
Black and white image that already segmented between object and background, than label are given in every object.However, not all images can be segmented well.There are some images that still contain noise.For example if the background image is not homogenous, sometimes in segmentation process a part of background can be detected as object.So filtering relevant object of the image is necessary in this step.The idea is based on assumption that if the detected object has smaller area than average area of object in an image and its location is far from the center of image, then it can be classified as noise or irrelevant object.By calculating area of object and minimum distance between every pixel in object and center of image, we can determine the relevancy of object.The distance between pixels , in object and center of image , are done by using Euclidean distance as shown in Equation 1.
Where , can be calculated from Equation 2. , Area in every objects is calculated based on their label, for example if the image contains two objects the result will show two area size.Then we combine the result between area and pixel distance of every objects to the center of image.From image , the object =(1,2,3,..., k) having area more than α and distance less than β, can be determined as relevant object.Where α can be calculated by Equation 3 and β has a value 0.4.

Region Based Image Retrieval Using Ratio of Proportional Overlapping… (Agus Zainal Arifin)
1611 After obtaining the relevant object, the next step is compute the percentage of object area in every sub-blocks.From area , that describe as area object overlap with sub-block , the ratio of proportional area overlapping object , can be calculated by Equation 4.
This method used threshold , 0.1.So that if sub-block contains area overlapping object more than 0.1 it can be indicated as relevant sub-block.By using the ratio of area overlapping object, the relevant object that has small size can still be detected.Every relevant sub-blocks will be included into list of query sub-block that we called Saliency Region Overlapping Block (SROB).For example in Figure 4, the SROB of image query is sub-block 2, 5, 6, 8 and 9. Then the selected query sub-block is used in the next step to extract local feature of query image.

Feature Extraction
Local feature and global feature are combined to get better retrieval results.Local feature of image query and image database are extracted based on the selected sub-blocks.This research, uses color and texture as local feature, while global feature uses shape of object as descriptor.

Color
Colors are commonly used as feature descriptor, because naturally human visualization can easily distinguish image by its color.In this paper, we use HSV histrogram to extract color feature.HSV is chosen because this model can be superior color space, compare to RGB [7].HSV color model divides color into 3 component, H (Hue), S (saturation), and V (Value).
The Hue component represents the type of color e.g.red, yellow, green, etc. Hue are described by specific position in color wheel, red starting at 0 degrees, green at 120 degrees and blue at 240 degrees.Complementary colours are in-between: yellow is at 60 degrees, cyan is at 180 degrees, and magenta is at 300 degrees.Saturation component represents how white the color is.Saturation has value between 0 and 1.The color shows pure color when the saturation value is 1, and will be diluted by white when saturation is decreased.Value component also describe as brightness.This component measure how black the color is.When value component are decrease, the blackness of the color are increased.
Every component of HSV model can be computed from RGB (Red, Green, Blue) model by calculating maximum RGB M, minimum RGB m, and delta d between M and m as shown in Equation 5-10 [7].
From this purpose, color is extracted from every selected sub-blocks in image query and image database.The feature is represented by histogram from each component (H, S, V).Than the value is normalized between 0 and 1.

Texture
Texture has an important role to describe the surface of an object and its relationship with the surrounding area [4].Local Binary Pattern (LBP) [8] is known as good texture descriptor with higher performance result.Some of the applications are used LBP as descriptor [4], [6], [9].
LBP describes a pixel value based on its neighbor gray level pixel.Given a central pixel c that has gray value g c and gray value from its neighbours g p , the LBP can be calculated using Equation 11.
P is the total number of neighbours and R is radius of the neighbourhood.After the LBP of each pixel in selected sub-block are calculated, then the histogram is created to represent the texture in every selected sub-block.

Shape
Moment invariants are important shape descriptors in computer vision.There are two types of shapes descriptors: contour-based shape descriptors and region-based shape descriptors.Regularly, in the most popular type invariant moment used is contour-based shape descriptor [10].
Hu invariant moments is basic to measure similarity between the template or database image.Hu's Seven Moments Invariants are invariant under translation, changes in scale, and also rotation.So it describes the image despite of its location, size, and rotation.In this research we use invariant moment to extract the shape feature.The seven momen invariant are used based on normalized central moments [11].They are useful for image scaling, translation and rotation.To calculate them, formula in Equation 12 can be used.
Translation invariance can be achieved by shifting the polynomial basis to the object centroid as sown in Equation 13.
In Equation 13, image described as I(x,y) as a piecewise continuous bounded function, variable p and q is positive integer [10] 14.The moment can be computed using the centroid of the image I(x,y) that equivalent to the whose center has been shifted to centroid of the image.

Similarity Measurement
Similarity between image query and image database is computed by using Euclidean distance for every feature.For local feature, the distance is calculate for each selected subblocks of query and image database.Assume that image query has n selected sub-blocks.Selected sub-blocks query are represented by ={ , , . . . ., } and , , . . . ., represented as selected sub-blocks in image database.The distance only compute in the same sub-blocks index.For example, to find distance in sub-block 1, the euclidean distance is performed between , and , .So that the distance between sub-block query and sub-block image database d( , , , ) can be calculate using Equation 15.
, , The final distance of local feature is described as average distance for every sub-blocks as shown in Equation 16.
For global feature, the similarity is calculated by applying Euclidean distance between image query and image database directly.The total distance from image query and image database is a combination of three feature distances.For every feature, we assign weight that are multiplied by distance value.Given weight of feature color , weight of feature texture , and weight of feature shape , the total distance can be calculate by Equation 17.
In this research the optimal distance that we used is 0.1, 0.4, 0.5 for , , respectively.After obtaining all the distance to image database, then performed the squencing by ascending order.The smallest the distance, indicate that two images has higher similarity.

Performance Measure Using Precision-Recall
The proposed method is evaluated using precision.Precision is commonly used to evaluate the performance of IR system.Based on Table 1, precision can be calculated by Equation 18. (18) Where TP (True Positive) shows the number of relevant images that can be retrieved by system, TN (True Negative) is the number of relevant images that cannot be retrieved by system, FP (False Positive) is the number of irrelevant images that system retrieve and FN (False Negative) is the number of irrelevant images that doesn't retrieve by the system.
For each query, the results are shown based on their ranking.The ranking are compute from the distance between query and image in database.From Table 2, it can be seen the average precision for k with a value of 5, 10, 15, and 20 are 0.84, 0.81, 0.76, and 0.74.When k increases the values of precision will decrease.For all variations of document return, Bus, Dinosaur and Horse has the highest precision with 1.00 (100%).Example of the query and result can be seen in Figure 5. Experiments were also conducted by using standard 11 recall.The results can be seen in Figure 6 that shows the precision-recall of the proposed method.

Weight Parameter
The measurement of the similarity between the query image with the images in the database is used three kind of features, where each feature has a weight.The weight of the features are used to indicate the percentage of the effect of these features to determine similarity.The weights for the color, texture, and shape features are symbolized by w c , w t , and w s , respectively.We performed experiments using different weights, to obtain the optimal parameter values.Table 3 shows the average value of precision for k=20 with different weights.Based on the Table 3, the value of the highest precision is obtained when the weights are w c , w t and w s of 0.10, 0.40 and 0.50, respectively, and also when the values are 0.35, 0.35 and 0.30.Between those two set-weight values, in this research we choose the first one because in k=5, 10 and 15 the first value-set is out perform than the second one.

Discussion
From the experimental results, there are several points that could be discussed.In Table 2, it can be seen that for category "Dinosaur" in Figure 5(a) has a precision value of 1.0 (100%).This is because the "Dinosaur" has a background image that is relatively homogeneous, consisting only of one color.It affects the segmentation results in the determination of the selected region for local features extraction.The segmentation results can also affect the determination of the shape which is used as a global feature.Category which also produces precision value 1.0 is "Horse" in Figure 5(b).Although the Horse has a background consisting of a variety of colors, but have a different color combination of other colors in the dataset, as well as texture.Another category is "Bus", in this category the Bus has similar shape in every image on this category that can be identify well using the feature shape.
This method can also be used in a heterogeneous background image for example in Figure 5 (b).However, because the background is heterogeneous, the selected saliency region is the overall picture of a horse image.It is because when segmenting using edge descriptor, grass is also detected as object.
The other category such as "Beach" and "Mountain", both have similarities in color feature.In a dataset that we use, both categories are dominated by white and blue colors.This can cause errors when retrieve the images, as can be seen in Figure 5(c).Beside that the scanary image has no fixed shape, so that the global feature that has higer weight cannot give much effect.This method has a weakness in the images that has no clear boundary between foreground and background, as in images of scenary.Whereas in the images with clear object, the method can produce good precision value.This weakness is based on the selection of the selected region are taken from the results of image segmentation.For objects that segmented properly, the selected region obtained will be better in representing the object.
Image consisting of multiple objects at the same time can also affect the determination of the selected region.Some of the images that have not homogeneous background are often detected as objects.Therefore, the determination of detected object relevancy is important to do.
In image retrieval, selected image query can affect the value of precision.In research conducted by Vemina and Jacob [1] mentioned the results of testing against several methods.Tests were performed using the same dataset with the dataset used in this study [1].For some category like "Africa", "Bus", "Elephant", "Dinosaur", and "Food" our proposed method produces better precision value.But for the other category the precision value of our proposed method is lower than the Vemina and Jacob Method [1].Nonetheless query that is used in previous method [1] is unknown, thus allowing for the difference between the image query conducted in this study with the experiments that have been conducted by Vemina and Jacob.
In this study, the determination of the weight parameter for each feature has an impact on the value of precision of the search results.In Table 3 it can be seen that the weight variations result the different precision.For this proposed method, the shape and texture features can be better descriptor than the color feature in some category.This is because in some category the object has common in color, so that in this case color cannot be used as good descriptor.We also find that texture can give a larger effect than the shape, because when the w t is set with low value the precision can be drop about 0.13.

Conclusion
The proposed method based on proportional overlapping region to choose relevant selected region has average precision 74 % in k=20 and has maximum precision 100%.However, this method has a weakness in the images that has no clear boundary between foreground and background, as in images of scenery.For further research, it will be optimized in determining the weigh parameter.And will also optimize edge descriptor segmentation by prunning unnecessary edge from unrelevant object.

Figure 2 .
Figure 2. Preprocessing: Sobel filter, (a) Original image, (b) gray scale image and (c) image after applying Sobel filter

Figure 3 .
Figure 3. Erosion using circle, (a) Original image, (b) Image after dilation and filling and (c) Image after erosion

Figure 4 .
Figure 4. Determination region as query, (a) Divided image into fixed size 3 x 3, (b) Give the identity number in each sub-blocks

Figure 5 .
Figure 5. Example query and results for retrieval k=20, (a) Query and results of category "Dinosaur", (b) Query and results of category "Horses", (c) Query and results of category "Mountain"

Figure 6 .
Figure 6.Performance of the proposed method

Table 1 .
Table 2 shows precision with a different k, where k describe as number of retrieval images.Confusion Matrix

Table 2 .
Precision in Variety of k