Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut

With qualified data or information a better decision can be made. The interval width of forecasting is one of data values to assist in the selection decision making process in regards to crime prevention. However, in time series forecasting, especially the use of ARIMA model, the amount of historical data available can affect forecasting result including interval width forecasting value. This study proposes a combination technique, in order to get get a better interval width crime forecasting value. The propose combination technique between ARIMA model and Fuzzy Alpha Cut are presented. The use of variation alpha values are used, they are 0.3, 0.5, and 0.7. The experimental results have shown the use of ARIMA-FAC with alpha=0.5 is appropriate. The overall results obtained have shown the interval width crime forecasting with ARIMA-FAC is better than interval width crime forecasting with 95% CI ARIMA model.


Introduction
In the digital era, information system plays very important role in each organization [1].Sometimes decision makers need forecasting data or information as a basis for making a decision.The use of Decision Support System (DSS) allows decision makers to make decisions appropriately taking into account the possible conditions that may occur.In order to predict future conditions in the decision making process, statistical technique like forecasting methods is used.To adjust the parameters in order to get better forecasting system than the previous researches.The involvement of forecasting in decision planning is aimed to help decision makers to come up with a good decision.Using forecast data, decision-maker can take into account subsequent events in a given situation, and thus influence the final decision [2][3][4].
However, forecasting results may contain some uncertainties [4][5][6].Uncertainty means more than one outcome is consistent with our expectations [7].In order to overcome this condition, we can estimate the range of forecasting values, which is called interval forecast or interval values [4].These ranges could predict the best and worst situation that may occur.One of the methods that have been discussed is ARIMA model.ARIMA model is one of the most popular models in time series forecasting analysis [8][9][10][11].This model has the advantage of giving an accurate forecast in a short time period.ARIMA has been widely used in many different areas such as in social, economic, engineering, crime prevention, and foreign exchange field [12][13][14][15][16][17].A good analysis of interval for ARIMA model is required to determine precisely the worst and the best possible forecasting conditions.
The accuracy of time series forecasting is important for many decision processes.However, ARIMA models have data limitation.The limitation of the ARIMA model is in the data requirements.ARIMA models need a large historical data at least 50 records and preferably 100 record or above.In some situations, however, we have to forecast future situations using few data points over a short time period.As the forecasting results, ARIMA models provide lower bound and upper bound values as well as forecasting values.This value is obtained from the use of confidence interval calculation.The ARIMA limitation affects the forecasting and forecasting intervals results [12,18,19].Fuzzy Alpha Cut (FAC) is also a technique to find the range of values.Some researcher were implemented FAC, such as [20,21].They have applied FAC to calculate the fuzzy expected values of the possibility-probability distribution.The use of FAC for evaluation the earned value by [22].In his study of the fuzzificaton of the variable CP (cost of electricity production) allows an analysis via FAC [23].However, there is a lack of exploration of the use of FAC on the interval of forecasting calculation, especially in ARIMA model.Then, to improve the interval forecasting result of ARIMA, this study proposed combination between ARIMA model and FAC.By combining FAC into ARIMA models, expectation to find better forecasting range values can be reached.The experiment of this proposed combination was used index motorcycle crime data.

Research Method 2.1. ARIMA Model
The ARIMA model aims to describe the current behavior of variables in terms of their linear relationships with historical data.It can be decomposed into two parts.First, it has an integrated (I) component (d), which represents the amount of differencing to be performed on the series to make it stationary.The second component is an ARMA model for the series rendered stationary through differentiation.The ARMA component is further decomposed into AR and MA components.The autoregressive (AR) component (p) captures the correlation between the current value of the time series and some of its past values.The moving average (MA) component (q) represents the duration of the influence of a random shock.The Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are then used to estimate the values of p and q [15].The ARMA (p, q) has the general form: where yt and εt are the actual value and random error at time period t, respectively; while øi(i=1, 2,…, p) and θj(j=0, 1, 2,…, q) are model parameters.p and q are integers and often referred to as orders of the model.The random errors, εt, are assumed to be independently and identically distributed with a mean of zero and a constant variance of σ2.When one of the terms is zero, it is common to drop AR, I or MA.For example, an I(1) is an ARIMA(0,1,0), and MA(1) model is an ARIMA(0,0,1).Given a time series of data Xt where t is an integer index and the Xt are real numbers, then an ARMA(p, q) model is given by: [24,25] (2 where L is the lag operator, the αi are the parameters of the autoregressive part of the model, the θi are the parameters of the moving average part and the   are error terms.The error terms are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.Assume now that the polynomial in (3), (3) has a unitary root of multiplicity d.Then it can be rewritten as: an ARIMA (p, d, q) process expresses this polynomial factorization property, and is given by: ( thus can be thought as a particular case of an ARMA(p+d,q) process having the auto-regressive polynomial with some roots in the unity.For this reason every ARIMA model Enhance interval width of crime forecasting with ARIMA model-fuzzy... (Yaya Sudarya Triana) 1195 with d>0 is not wide sense stationary.ARIMA models are used for the observable nonstationary processes Xt that have some clearly identified trends: a. Constant trend (i.e. a non-zero average) leads to d=1. b.Linear trend (i.e. a linear growth behavior) leads to d=2.c.Quadratic trend (i.e. a quadratic growth behavior) leads to d=3.In these cases the ARIMA model can be viewed as a "cascade" of two models.The first is non-stationary: while the second is wide-sense stationary: The AR process order is determined from the PACF graph and the similarity MA process order is determined from the ACF graph.The patterns of the sample ACF and PACF used to determine the model processes summarized in Table 1.In [8], the ARIMA procedure fits a model with a certain number of parameters and tests for the significance of the parameters.It means if the parameters are zero (null hypothesis, H0) or different from zero (alternative hypothesis, H0).In order to test the significance of the parameters considered in the model, t-statistics and P-value are used.The t-statistic is used to determine the P-value.P-value is determined as α=0.05 a level corresponding 95% of confidence interval.If the P-value is less than α value, H0 is rejected.For the ARIMA model which has a P-value <α then the model is acceptable.
The next check step is residuals.Ljung Box test is used to test whether the first k ACF of the residuals are significantly different from what would be expected from a white noise process.Using the usual significance level of α=0.05, a model passes this test if P-value>α.The large P-values indicate that the residuals are not distinguishable from a white noise series [19][20][21], [26][27].

Confidence Interval (CI)
The 95% CI technique, usually included in forecasting process with an ARIMA model [3,12,22,25] stated that a CI is more informative since it indicates, with a known degree of confidence, the range of possible effects.In Tong, et al. [27], a (1-α) 100% CI for an unknown parameter (e.g.population mean) is an interval calculated from the sample data.such that (1-α) of the intervals will enclose the true parameter value.For example, if we take 95% CI is an interval with 0.95 probabilities to enclose the true parameter.This means the t_α value is 5% or 0.05.Suppose that {x_1,x_2,…,x_n} is a random sample drawn from a normal population with unknown mean µ and unknown variance σ^2, then a (1-α) 100 % CI for the true mean can be constructed as follows: where is: If the sample size n is larger (n≥30), referring to the Central Limit Theorem [24], then x ̅ is will be approximately normally distributed regardless of the distribution of the sample population.Therefore, the CI construct is as follows: where is : x = The sample mean zconf = A number from the standard normal table that satisfies the confidence specifications for the confidence interval x  = The standard error of the mean In the other construct as follow:

Xlo
= Lower limit value Xup = Upper limit value

Fuzzy Alpha Cut
The proposed technique combines ARIMA model and FAC technique, as shown in Figure 1.This combination is aimed to obtain more accurate interval forecasting result.This process divided in 2 parts.The first part is Box Jenkin's methodology to find ARIMA model.The forecasting results and confidence interval from using ARIMA models will be converted into TFN.The arithmetic operation on the fuzzy alpha-cut, for instance the alpha-cut method is applied to the data through the fuzzy environmental process.

Results and Analysis
The first experiment has been used 82 months of index crime data are used, as appropriate under the requirements of ARIMA models.In Part I possible three ARIMA model was obtained from this experiment, ARIMA (1,0,0), ARIMA (1,1,0), and ARIMA (0,1,1).The result found ARIMA (1,0,0) to be the best model, because the model has a P-value less than 0.05.The model also has the smallest MSE rather than the other model, it is 29.71.Then, the forecasting process was carried out using this model.The forecasting result is shown in Table 3.   4 used with α-cut values of 0.3, 0.5 and 0.7 to find the new lower and upper bounds of the forecasting range values.The results value of Part II for using 82 months data is presented in Table 4.
Table 4 presents the actual values, lower and upper of ARIMA-FAC with α=0.3, α=0.5, and α=0.7.The lower and upper values will use to find the interval width.The second experiment has been used 51 months of index crime data, as appropriate under the TELKOMNIKA Vol.17, No. 3, June 2019: 1193-1201 1198 requirements of ARIMA models.In Part I possible two ARIMA model was obtained from this experiment, ARIMA (1,1,0) and ARIMA (0,1,1).The results found ARIMA (1,1,0) to be the best model.The model also has a P-value less than 0.05.The MSE value is 38.90.Then, the forecasting process was carried out using this model.The forecasting result is shown in Table 5.While visualisation in graphs of the forecasting values, with lower and upper values of confidence interval (CI), are shown in Figure 3.    6.The third experiment has been used 32 months of index crime data, as appropriate under the requirements of ARIMA models.In Part I possible two ARIMA model were obtained from this experiment, ARIMA (1,1,0) and ARIMA (0,1,1).The results found ARIMA (0,1,1) to be the best model.The model also has a P-value less than 0.05.The model also has the smallest MSE, it is 33.072.Thus, Part I was carried out using this model.The forecasting result is shown in Table 7.While visualisation in graphs of the forecasting values, with lower and upper values of confidence interval (CI), are shown in Figure 4. In

Conclusion
Comparison results between ARIMA and ARIMA-FAC have been done.The comparisons are based on graphs, error measurement values and interval width, then the conclusion is conducted.The comparison presented the better results obtained from ARIMA-FAC with used α=0.7.However, at the ARIMA-FAC graphic results have shown the use of ARIMA-FAC α=0.3 and α=0.7 are not appropriate.The graphs of lower and upper ARIMA-FAC with α=0.3 showed not much closer to the actual value.The graphs of lower and upper ARIMA-FAC with α=0.5 have shown the closest value to the actual value.The results gained from the use of α=0.7 not come closer to the actual value, even less than the actual value.Then the actual value is not in the interval range.Based on Zhou,et. al. [16] that the accuracy of forecasting is better if the actual or real value fall within the interval range.Therefore, the use of α=0.3 and α=0.7 do not show the results in accordance with the purpose of the proposed combining technique.Then, the use of ARIMA-FAC with α=0.5 is appropriate.The error measurement comparisons also showed the same result with the graphics results.The error measurements of ARIMA-FAC results are better than ARIMA results.The values of interval width also present the better values for ARIMA-FAC.The narrower interval width obtained from the use of FAC.Therefore, the results from ARIMA-FAC will be used for analyzing decision options in crime prevention

Figure 1 .
Figure 1.Combination Technique of ARIMA model and FAC While visualisation in graphs of the forecasting values, with lower and upper values of confidence interval (CI), are shown in Figure 2. In part II, transformation the forecasting values from Part I into TFNs is done.The forecasting values with lower and upper values became a crisp value by a triplet (a, b, c) of the TFN.By using the values of forecasting from Part I, next step is transformed the forecasting values into TFN values.After that, ARIMA-FAC in Table

Figure 3
Figure 3 Plot of forecasting values from ARIMA (1,1,0) part II, transformation the forecasting values from Part I into TFNs is done.The forecasting values with lower and upper values became a crisp value by a triplet (a, b, c) of the TFN.By using the values of forecasting from Part I, next step is transformed the forecasting values into TFN values.After that, ARIMA-FAC in Table 4 used with α-cut values of 0.3, 0.5 and 0.7 to find the new lower and upper bounds of the forecasting range values.The results value of Part II for using 82 months data is presented in Table8.

Table 1 .
Summarize of ACF and Pacf Patterns

The Comparison of Interval Forecasting
The results in previous section have shown that the uses of different α values affect the lower and upper bound values.This proposed combining technique has the purpose to find forecasting range values closer to the actual values.Next, compare the ARIMA model CI range MSE with the ARIMA-FAC MSE.The other comparison is comparing the interval width of the CI ARIMA model with ARIMA-FAC.In this comparison the small MSE value is expected.With small value the performance of forecasting is better.The comparisons of the MSE values for forecasting results and FAC results in neighborhood C, with different amount of crime index data, are shown in Table9, Table10 and Table 11.The Tables9-11shown the MSE value comparisons of interval forecasting in CI ARIMA model with interval forecasting in ARIMA-FAC α=0.3, α=0.5 and α=0.7.The MSE values of ARIMA-FAC are much better than CI ARIMA.

Table 9 .
MSE Forecasting Range Results (Neighborhood C used 82 Months Data)

Table 10 .
MSE Forecasting Range Results (Neighborhood C used 51 Months Data)

Table 11 .
MSE Forecasting Range Results (Neighborhood C used 32 Months Data)