Suitability analysis of rice varieties using learning vector quantization and remote sensing images

Rice (Oryza Sativa) is the main food for Indonesian people, thus maintaining the stability of rice production in Indonesia becomes an important issue for further study. A strategy to overcome the issue is to apply precision agriculture (PA) using remote sensing images as a reference due to its effectiveness. The initial stage of PA is suitability analysis of rice varieties, including INPARA, INPARI, and INPAGO. While the representative features that can be extracted from remote sensing images and related to agriculture field are NDVI, NDWI, NDSI, and BI. Therefore, the aim of this study is to identify the best model for analyzing the most suitable superior rice varieties using Learning Vector Quantization. The results show that the best LVQ model is obtained at learning rate value of 0.001, epsilon value of 0.1, and the features combination of NDWI and BI values (in standard deviation). The architecture generates accuracy value of 56%.


Introduction
Rice (Oryza sativa) is a kind of foods derived from grass family.In Indonesia, rice is the main food of most Indonesian.This makes the need for rice in Indonesia continues to increase as the population growth.However, this condition is not matched by increasing rice production.Hence, maintaining the stability of rice production in Indonesia becomes an important issue for further study.One of the solutions that can be implemented is by applying precision agriculture (PA) in rice sector production.PA is a technology that enhances farming techniques, starting from the pre-planting, during in-season growth, through harvest and post-harvest stages [1].Due to a condition that pre-planting stage is an initial stage of PA, then the success of rice varieties determination at pre-planting stage become the main factor of PA's success.In addition, PA technologies include Remote Sensing (RS), Global Positioning System (GPS), Geographical Information System (GIS), Soil Testing, Yield Monitors, and Variable Rate Technology [2].
Remote Sensing is a science or art of getting information about an object, area or phenomenon through the analysis of data obtained with a tool without direct contact with the object, area, or phenomenon themselves.The utilization of RS for PA is taking information about the attributes of the physical, chemical and biological characteristics of Earth's surface through the image of the RS [3].Moreover, there are other advantages of RS image in PA, namely remote sensing images are capable to present the earth surface in the form of the existence of the objects without the need to make direct contact with the objects, up-to-date, as well as reliable.Therefore, the aim of this study is to propose a novel PA technology for the pre-planting stage, specifically, a method for identifying the suitable rice variety for an area using remote sensing images as the reference.
In general, there are several studies related to the suitability analysis of plant types that have been performed such as land suitability analysis for rabi (winter season) crops and kharif (summer season) crops in Kheragarih [4], land suitability analysis for mangrove species [5], land suitability for maize crop in Okara Distric [6], land suitability and rotation performance between canola and soybean [7], etc.All of those studies were developed using Analytic Hierarchy ISSN: 1693-6930  Suitability analysis of rice varieties using learning vector... (Retno Kusumaningrum) 1291 Process (AHP) which has weaknesses when criteria or alternatives are used in large numbers [8].Moreover, the problem of AHP method is a compulsion of the availability of criteria and alternative data with a high level of precision.In addition to the problem of determining the suitability of plant species for a region, AHP is also widely implemented for site suitability analysis.For example, site suitability analysis for agricultural land use of Darjeeling district [9], site suitability analysis for rangelands in Taleghan basin [10], suitable lands for agricultural use in the Yusufeli district of Artvin city, Turkey [11], etc.While the problems related to the study field of rice crops, more research is focused on determining whether an area is appropriate for planting rice crops or not, such as Central Part of Amol District, Iran [12], Haripur Upazila, Thakurgaon district of north-west part in Bangladesh [13], Northern part of Bangladesh [14], Morobe Province, Papua New Guinea [15], Subang Regency, Indonesia [16], etc.Another implemented method is using GIS modeling for determining the land suitability for rice cultivation [17].This study using rice cultivation, soil, and topographic data were taken from digital soil map.In addition, there is a study which combines AHP and GIS for analyzing the suitability of rice growing in Great Mwea region Kenya [18].However, the study using GIS modeling has a limitation related to the availability of various of maps for a study area, such as soil map, topographic map, etc.This condition can be overcomed by using remote sensing images as a reference data.
One approach that can be used to overcome the problem of using AHP method is by applying one of machine learning methods, where the developed system will be able to learn based on existing data samples.For example, the availability of data on the main producing regions of a rice variety can be used as a reference to predict other similar areas, so it is expected to provide high rice productivity as well.In machine learning, this approach is called as classification problem.In addition, combining the machine learning algorithm with the remote sensing images will As the machine learning algorithm, we implemented Learning Vector Quantization (LVQ), since it has been implemented to predict 12 kinds of food plants with the high accuracy value, i.e. 100% [19].Furthermore, LVQ processing time is relatively faster than other artificial neural network methods because the LVQ network architecture consists of only 2 layers, that are, the input layer and the output layer.
According to the Indonesian Center for Rice Research (ICRR), characteristics of land to plant rice are related to soil texture, salinity, as well as humidity.The texture and humidity are closely related to the availability of water in the soil.Some of the features of RS image that can be used to calculate the value of those factors are the Normalized Difference Vegetation Index (NDVI), Normalized Difference Salinity Index (NDSI), Normalized Difference Water Index (NDWI), and Brightness Index (BI).Moreover, determining the combination of these features in LVQ-based classification model becomes essential for this study, hence we can determine the suitability of superior rice varieties for an area.Therefore, the aim of this study is to implement the LVQ for identifying the land suitability for several rice varieties by using remote sensing images as reference and some features such as NDVI, NDSI, NDWI, and BI in particular.

Research Method
We implement a popular method for classification task in neural networks (NN), namely Learning Vector Quantization (LVQ).LVQ is one of the methods of NN that performs a study to monitored competitive layer.LVQ is the single-layer network at the input layer that is connected directly with every neuron in the output layer.Connections between neurons are associated with weights.Neurons at the output on LVQ declare a class or a specific category.Classes are obtained as a result of this competitive layer depends only on the distance between the input vectors.If two input vectors appear to be same, the layer will put both input vectors into the same class.Application of LVQ method in determining the suitability of rice plant varieties involves data input used as training data and test data, as well as what output will be produced.Input and output can be seen in the architecture of LVQ shown in Figure 1.
According to Figure 1, the LVQ architecture has several input neurons depend on the features combination, while the amount of output neurons is 3 neurons.In this figure, the number of input neurons (the value of x) in the architecture represents the number of features defining the suitability of rice varieties which are combined from 8 attributes, i.e. mean and standard deviation over NDVI (Normalized Difference Vegetation Index), NDSI (Normalized Difference Soil Indeks), NDWI (Normalized Difference Water Index), and BI (Brightness Index)  ISSN: 1693-6930 TELKOMNIKA Vol.17, No. 3, June 2019: 1290-1299 1292 values.We use i as the index for neuron, which has value range from 1 to 8. The calculation of those indexes is using three bands of 11 bands of Landsat-8, i.e. green, red, and NIR.The detail explanation of those indexes is as follow: -NDVI is used to find the value of the density of vegetation in a given area.The density of vegetation can be used to analyze the state of vegetation in a given area.The value of NDVI is obtained by dividing the difference value between NIR and Red band with the total value of NIR and Red band [20].-NDSI is the index to find the value of organic materials, particularly salinity (salt levels) of a region.Organic ingredients are components of the soil, which are very closely related to soil quality and critical component in successful farming systems.The existence of organic soil matter is often used as a general indicator of the fertility of the soil.In addition, the purpose of NDSI is to identify areas where soils is the dominant background or foreground material [21].NDSI is obtained by dividing the difference value between Red and NIR band with the total value of Red and NIR band.Although the value of NDSI is mathematically equal to negative value of NDVI, we still evaluate both attributes since it has different characteristics and objectives in term of remote sensing fields.
-NDWI is intimately connected with the water levels of the plant.Water lack can have a significant impact on the plant growth and leads to failing harvest or low crop production.The value of the NDWI derived from the calculation of Green and NIR that are reflected [22].

1293
-BI is the transformation to find out the value of the brightness of land that later can be used to find out the level of soil moisture.Three of the six multispectral features of Landsat are often used, i.e. (i).B for brightness is to measure soil, (ii).G for greenness is to measure vegetation, and (iii).W for wetness is to show the interrelationship of soil and canopy moisture [23].Moreover, plant growth is strongly influenced by soil moisture, so it can be said that soil moisture can be used as an indicator of land management.Soil moisture is an important indicator in the soil and can be used to assess the resistance of the soil to disasters, such as flood, erosion, landslide, and land drought.The value of BI is obtained from the following formula [18,24,25].The first step of suitability analysis model formation is data collection.This process is conducted based on producing regions of rice varieties in Indonesia in 2015, i.e.West Java, East Kalimantan, West Nusa Tenggara, and South Sulawesi.Subsequently, we crop images into 500x500 pixels and we only extract three bands, i.e. green band, red band, and NIR band of LANDSAT 8 images.Total number of extracted images in database is 150 images.
The second step is feature extraction which is performed to retrieve the used features of satellite images.These features are calculated using in (1), ( 2), ( 3) and ( 4).Since each of the extracted features in the form of matrix with size is equal to image size, then we compute mean value and standard deviation value for each feature and each image.These extracted features are saved and it is subsequently called as research dataset.In order to divide the dataset into training data and testing data, we used 10-Folds Cross Validation, since it is the most common cross validation techniques.The next step is normalization process.This process aims to get data in the same range representing the original data without losing its characteristics.In this case, the drawn range is [0.1].The normalization uses the following formula: (5) The training process aims to train the LVQ method to be able to recognize data patterns on soil characteristics.The output of this process is weight value (w) that will be used in the Testing process.While the input of this process is learning rate values and epsilon values that are each value varied at 0.1, 0.01, 0.001, 0.0001, and 0.00001.The used training algorithm is as follows: the following Table 1 explains the meaning of each notation in algorithm.
1. Set the number of testing data   t .The last step is evaluating the performance of classification model for each fold by comparing the class prediction and ground truth class.In this experiment, we use accuracy as evaluation metric.The best model, that is the model that results the highest value of accuracy, is used as classification model in second process (suitability analysis of rice varieties).This process is a real time process and employs the same step as previous process, i.e. feature extraction, data normalization, and class prediction.

Results and Analysis
In this section, we explain three sub-sections, i.e. the used data, the scenario of experiments as well as its results analysis.The data sub section describes several points, i.e. the number of data and its composition for each class.Moreover, the detail purpose of each scenario and its mechanism are explained in the next sub section.The final sub section discuss the result experiments and its analysis.

Data
Data used in this research are 150 samples of land imagery.Each of varieties is represented by 50 sample images.As explained before, the rice plant varieties studied in this paper are INPARI, INPARA, and INPAGO.

Scenario of Experiments
Experiments are conducted to know the parameters that yield the best architecture of LVQ in determining the suitability of rice varieties.Scenarios are divided into two parts, scenario 1 and scenario 2. The detail of both scenarios are described in the following paragraph.Aim of the first experiment is to obtain the best LVQ's parameter for each features combination.There are two LVQ's parameters that are observed in this experiment, that are, epsilon and learning rates values.Both epsilon and learning rates values are ranging from 0.00001 to 0.1.Meanwhile, the features combinations are generated based on 8 employed features, i.e.NDVI , sdNDVI , NDSI , sdNDSI , NDWI , sdNDWI , BI , and sdBI .Illustration of the first experiment can be seen in Figure 3 as well as the detail explanation of features combinations can be explored in Table 2.
The training process is conducted many times by doing a test based on the combination of the learning rate (α) and minimum error ( ).Then, after the training process is completed, pattern testing is conducted to know the value of accuracy and error in the scenario.The results of accuracy and error for data k-fold 1 to k-fold 10 has an average value for each combination of learning rate (α) and minimum error ( ).The second experiment is conducted to figure out the best overall LVQ architecture.Experimental results are based on the comparison of the best performance among the combination of features in the first experiment.The architecture of particular features combination with the highest accuracy value is subsequently assigned as the best overall LVQ architecture.

Results Analysis
Results of the experiment can be seen in Table 3.There are several pieces of informations that can be obtained from Table 3. Instead of the best accuracy value, the best LVQ's parameters are also described for each features combination.Moreover, illustration against the comparison of the accuracy values for each combination of features is depicted in  Based on 26 features combinations, it is referred that the combined value yielding the highest accuracy value is derived from the combination of features of standard deviation of NDWI and standard deviation of BI with following parameters, i.e.Learning rate (α) is 0.001; minimum error (eps) is 0.1; as well as maximum epoch is 3000.This combination produces an accuracy value of 56% and the highest accuracy value is 93% at the first fold.This result is shown in Figure 5.The extracted features which are derived from standard deviation give a better result than features which are derived from mean value since the standard deviation is more sensitive to outliers and this is a good thing in the classification problem.The BI is a useful feature to find out the level of soil moisture based on the brightness of land.In which, the soil moisture profoundly influences the plant growth and is an important indicator to assess the resistance of the soil to disasters, such as flood, erosion, landslide, and land drought.While the NDWI represents the water levels of the plant, in which water lack can have a significant impact on the plant growth and leads to failing harvest or low crop production.On the other hand, NDVI and NDSI do not significantly affect the accuracy of the determination of rice varieties in an area.It is because NDVI shows vegetation density, while NDSI shows the level of soil salinity.

Conclusion
The conclusions that can be drawn from this research are the best LVQ model for determining the most suitable rice variety for an area is obtained at the architecture of LVQ with learning rate value is 0.001, epsilon value is 0.1, and the combination of sdNDWI (standard deviation of Normalized Difference Water Index values) and sdBI (standard deviation of Brightness Index values).The architecture generates an accuracy value of 56%.The highest accuracy value is 93% at the 1 st fold.The BI is a useful feature to find out the level of soil moisture based on the brightness of land.In which, the soil moisture profoundly influences the plant growth and is an important indicator to assess the resistance of the soil to disasters, such as flood, erosion, landslide, and land drought.While the NDWI represents the water levels of the plant, in which water lack can have a significant impact on the plant growth and leads to failing harvest or low crop production.Both values of BI and NDSI are derived from standard deviation since it gives a better result than features which are derived from mean value since the standard deviation is more sensitive to outliers and this is a good thing in the classification problem.When viewed from the LVQ architecture, a small learning rate gives higher accuracy in making weight changes so that the optimal final weight can be obtained, while the value of epsilon 0.1 is chosen because to increase computing time so that the training process runs more effectively.

Figure 1 .
Figure 1.The architecture of employed LVQ (using 8 features for each image in training data) value of Y (the output neurons) is a class/target consisting of 3 classes of superior rice varieties.Superior rice varieties are one of the innovative technologies to increase rice productivity, both either through potential improvement or plant results, as well as tolerance and resistance against the biotic and abiotic grasps.Superior variety has several advantages compared to other types, including high yield potential, tolerant of biotic and abiotic grasps, early-ripening/fast come to fruition, and the result can be reused as seeds.Since 2008, the superior variety is grouped into Inbrida Swamp Rice (INPARA), Inbrida Irrigation Rice (INPARI), and Inbrida Padi Gogo (INPAGO).Therefore, the number of output neurons in this study are three neurons represent INPARA, INPARI, and INPAGO.Subsequently, the stages that are conducted in this research are shown in the flow chart as seen in Figure 2.

1 .
Set the initial weight (w), maximum epoch (MaxEpoch), minimum error (eps), and the learning rate () which is worth 0 << 1.The value of the initial weight of this research uses the first data of each class that is already normalized.2. Enter the input x and target 3. Set the initial condition of epoch = 0 4. Do if (epoch < MaxEpoch) or ( > eps) a. epoch = epoch + 1 b.Do for i = 1 to m i. Do for k = 1 to num_class Calculate the value of Suitability analysis of rice varieties using learning vector... (Retno Kusumaningrum) 1295 b.Predict the class of l -th testing data

Figure 3 .
Figure 3. Illustration of first experiment

Figure 4 .
According to Figure4, the LVQ method yields the highest accuracy value for the individual feature on Brightness Index (BI) by the value 53%.The feature BI has a characteristic that is high pixel brightness of band Red and NIR, so BI is the most dominant feature compared to others.NDWI feature values accuracy 49%, while NDVI and NDSI provide the smallest accuracy value of 46%.The features combined with NDVI or NDSI provide the same value of accuracy.It means that both NDVI and NDSI are unimportant features.On contrary, a feature combined with BI feature gives high accuracy value compared to that of other features, due to TELKOMNIKA ISSN: 1693-6930  Suitability analysis of rice varieties using learning vector... (Retno Kusumaningrum) 1297 BI becoming the dominant feature or the most representative feature to distinguish an appropriate rice variety for each area.

Figure 4 .Figure 5 .
Figure 4. Graph of Accuracy Comparison for Each Features Combination

Table 1 .
List of Notations i yThe class label of i -th image Scalar

Table 2 .
Detail Information of Features Combinations

Table 3 .
Detail Information of Features Combinations