2018 Hybrid model for forecasting space-time data with calendar variation

3 Abstract The aim of this research is to propose a new hybrid model, i.e. Generalized Space-Time Autoregressive with Exogenous Variable and Neural Network (GSTARX-NN) model for forecasting space-time data with calendar variation effect. GSTARX model represented as a linear component with exogenous variable particularly an effect of calendar variation, such as Eid Fitr. Whereas, NN was a model for handling a nonlinear component. There were two studies conducted in this research, i.e. simulation studies and applications on monthly inflow and outflow currency data in Bank Indonesia at East Java region. The simulation study showed that the hybrid GSTARX-NN model could capture well the data patterns, i.e. trend, seasonal, calendar variation, and both linear and nonlinear noise series. Moreover, based on RMSE at testing dataset, the results of application study on inflow and outflow data showed that the hybrid GSTARX-NN models tend to give more accurate forecast than VARX and GSTARX models. These results in line with the third M3 forecasting competition conclusion that stated hybrid or combining models, in average, yielded better forecast than individual


Introduction
Beside of time dimension, data can also have space dimension known as space-time data. Space-time model is a model that combines dependency between time and location in a multivariate time series data. This model firstly introduced by Cliff and Ord [1] and known as Space-Time Autoregressive (STAR) model. Then, Ruchjana [2] developed the Generalized Space-Time Autoregressive (GSTAR) as extension of STAR model. In this research, GSTAR will be applied in modeling of inflow and outflow currency in Bank Indonesia at East Java Region. Forecasting inflow and outflow currency is very important for Bank Indonesia as central bank to achieve the stability of the money circulating in the society. Therefore, this forecast activity should be done accurately.
Many economic data in Indonesia (as a Moslem majority country) have a seasonal pattern influenced by two types of calendars, i.e. the Christian and the Islamic calendar. The effect of the Christian calendar causes inflow and outflow currency have high transaction volume in certain months, particularly on December. Whereas, the Islamic calendar affects high volume of inflow and outflow currency on the months around Eid Fitr celebration. Due to the date of Eid Fitr dynamically changes every year, it causes inflow and outflow of currency are influenced by calendar variation patttern. Hence, it is necessary to add exogenous variable in GSTAR model for handling this calendar variation pattern. Recently, space-time model with exogenous variable, known as GSTARX, has been proposed by Suhartono et al. [3].
According to Zhang [4], it is possible that data are formed from a linear and nonlinear structure at once. The previous study also showed that there was a nonlinear pattern on currency inflow and outflow data [5]. However, GSTAR is a linear model which cannot handle nonlinear correlation structures in the data. Hence, GSTAR with calendar variation effect (GSTARX) will be combined with neural network (NN), known as the hybrid GSTARX-NN, for handling both linear and nonlinear patterns simultaneously. The hybrid GSTARX-NN is done in two stages, the first stage is done to eliminate trend, seasonal, and calendar variation effect using time series regression. Residual of the time series regression will be modeling using GSTAR-NN in the second stage.  (Suhartono) 119 Additionally, NN modeling requires a proper architecture to produce a minimum forecast error. Intuitively, deeper architectures will give results in the better forecasts. The depth of the NN model is measured by the number of hidden layers. In this research, two kinds of NN architectures are used to find out whether the deeper of the architecture provides better forecast. The first is the hybrid GSTARX and Feedforward Neural Network (known as hybrid GSTARX-FFNN) that contain one hidden layer for NN part, and the second is the hybrid GSTARX and Deep Learning Neural Network (known as hybrid GSTARX-DLNN) that consist of two hidden layers for NN part. Furthermore, the forecast accuracy of the hybrid GSTARX-FFNN and hybrid GSTARX-DLNN models will be compared to VARX and GSTARX models by using Root Mean Squares Error (RMSE) criteria.

Research Method 2.1. Calendar Variation Model
The calendar variation model is a time series model used to forecast data based on seasonal patterns with varying periods [6]. Calendar variation model can be modeled using time series regression. In general, the model of calendar variation was developed based on the regression method for the data that contain trend, seasonal, and effect of calendar variation (e.g. Eid Fitr) on the data. The calendar variation model for inflow data is written in (1): whereas, the calendar variation model for outflow data is written in (2):   12   ,  1  2  1,  3  2,  4  1,  5 2, , , where  is a linear trend parameter, Di,t is a linear trend dummy variables, S1,t, S2,t, … , S12,t, is a seasonal dummy variables, , , and  is a parameter of calendar variation for Eid Fitr effect on the occurrence of the month during Eid Fitr, one month after and one month before Eid Fitr, respectively. Ni,t is a noise series.

VAR Model
Vector Autoregressive (VAR) is a forecasting model that could be used to know the interrelationship between location and time simultaneously. The procedure for building of VAR model refers to the Box and Jenkins procedure [7] which includes four steps, i.e. identification, parameter estimation, diagnostic checks, and forecasting. Let Zi(t) with t  T, t = {1, 2, …, T} and i = {1, 2, …, N} are index of time and variables, VAR model is written in (3): [8] ( ) ( ) ( ), where Ż( ) is a multivariate time series vector, p (B) is an autoregressive matrix of AR(p) and a(t) is a residual vector. In this research, the series data in VAR modeling is the residual model of calendar variation, that is Ni,t in (1) and (2).

GSTAR Model
GSTAR is a time series model that reveals linear dependencies of space and time, expressed by spatial weight (W). Let { Z(t) : t = 0, +1, +2, …, T } is a space-time data of N locations, then the GSTAR model with time order p and spatial order 1, 1, , p or referred to as GSTAR (p; 1, 1, , p) is written in (4): [9]   0 11 where Φ 0 = diag(∅ 0 1 , … , ∅ 0 ) , Φ 11 = diag(∅ 0 1 , … , ∅ 0 ) and a(t) is residual model that satisfies identically, independent, distributed with mean 0 and variance Σ . For example, GSTAR model for time and spatial order one is written in (5): There are several matrices of spatial weights (W) in GSTAR model. This research used three spatial weights, i.e. uniform weight, weight based on an inverse of the distance between locations, and weight based on normalization of partial cross-correlation inference. The data series that be used in GSTAR model are the residuals of calendar variation model in (1) and (2).

Hybrid GSTARX-NN Model
The hybrid model was introduced by Zhang to increase the accuracy of the forecast [4]. This model combines both linear and nonlinear components simultaneously. In this research, hybrid modeling is done in two stages. The first stage is to model trend pattern, seasonality, and calendar variations using time series regression. The second stage, modeling the first phase residuals using GSTARX-NN. The GSTARX-NN model that used consists of GSTAR-FFNN with one hidden layer and GSTAR-DLNN with two hidden layers. The flowchart of hybrid GSTARX-FFNN model and hybrid GSTARX-DLNN could be seen at Figures 1 and 2, respectively. The output is a vector that consists of inflow and outflow currency data at four Bank Indonesia regional office in East Java. The explanation about notations in Figures 1 and 2 are as follows: Ni,t-1 is the first lag of the TSR model residual at location i, Wi,t-1 is the first lag of the TSR residual model at location i that has been weighted.

Performance Evaluation
To evaluate the performance of the proposed hybrid GSTARX-NN model, the root mean squared error (RMSE) is selected as an evaluation index. The RMSE of in-sample data is defined as where n is the number of forecast, p is the number of parameters.

Experimental Design
In this research, two kinds of studies were conducted, i.e. simulation study and application study on monthly inflow and outflow currency data in Bank Indonesia at East Java region (i.e. Surabaya, Malang, Kediri and Jember). Simulation studies were conducted by generating data that have trend patterns, seasonal, calendar variations, linear noise series at the 1 st scenario and nonlinear noise series at the 2 nd scenario. Whereas the application study uses the inflow and outflow data as secondary data obtained from Bank Indonesia. Period of data are January 2003 to December 2014, and then the data are divided into two parts, i.e. training and testing data. The training data start from January 2003 until December 2013, and the remaining as testing data. The research variables can be shown in Table 1 and dummy variables for trend, seasonal, and calendar variations that be used in this research are shown in Table 2. 1, The first step is performing descriptive statistics, then modeling the data using VARX, GSTARX and hybrid GSTRX-NN model. Modeling using VARX, GSTARX and hybrid GSTARX-NN is done in two stages. The first stage is done to eliminate trend, seasonal, and calendar variation effect using time series regression. Residual of the time series regression will be modeling using VAR, GSTAR and hybrid GSTAR-NN in the second stage. After modeling using VARX, GSTARX and hybrid GSTARX-NN, the best model selection is based on the smallest RMSE values on testing data.

Results and Analysis 3.1. Simulation Study
A simulation study was conducted to find out the performance of the methods used in modeling data containing a trend, seasonality, calendar variations, linear or nonlinear noise series. The data used in simulation study is obtained following a model in (7), i.e.
where: 4) The components for the noise series used in the simulation study consist of linear noise series in (10) and nonlinear noise series in (11) where: The time series plot of each component is shown in Figure 3. The effect of trend component causes data increase over time. The effect of the seasonal component causes data high in certain months. Whereas, calendar variation affects data high on the months around Eid Fitr celebration.
Next, the noise identification is shown through the plot matrix between noise at time t (Nt) and noise at time t-1 (Nt-1) each location shown in Figure 4 and Figure 5. Based on matrix plot formed in the 1 st scenario, it can be seen at Figure 4 that the relation between Nt and Nt-1 is a linear pattern. While the matrix plot formed in the 2 nd scenario Figure 5, seen between Nt and Nt-1 each location patterned nonlinear.
Each scenario is replicated 10 times, and all simulated data are analyzed using VARX, GSTARX, and hybrid GSTARX-NN. Modeling with hybrid GSTARX-NN, based on the number of hidden layers consisting of hybrid GSTARX-FFNN with one hidden layer and hybrid GSTARX-DLNN with two hidden layers.
Based on the results at Figure 6, modeling the 1 st scenario using hybrid GSTARX-FFNN and hybrid GSTARX-DLNN provides more accurate forecast results than GSTARX and VARX. This can be seen from the mean and RMSE values of hybrid models that are smaller than the GSTARX and VARX models in all three locations. Overall the hybrid GSTARX-NN model is the best model for forecasting data that contains trend components, seasonality, calendar variations, and linear noise series compared to GSTARX and VARX.  Figure 7, it illustrates that in the 2 nd scenario the hybrid GSTARX-FFNN and GSTARX-DLNN models provide more accurate prediction than GSTARX and VARX model. Moreover, the architectural depth of the GSTARX-DLNN hybrid model does not have a significant effect on reducing the prediction error compared with the hybrid GSTARX-FFNN model. This is appropriate with the first conclusion of the M3 competition [10], i.e. sophisticated or statistically complex methods do not necessarily produce more accurate predictions than simple ones. Moreover, these results also showed that the neural network could capture accurately nonlinearity patterns.

Forecasting Inflow and Outflow with VARX, GSTARX and Hybrid GSTARX-NN
The growth of inflow and outflow in each Bank Indonesia East Java region can be shown in Figure 8. These figures show that the inflow and outflow have a high value around the Eid Fitr. The Eid Fitr that occurred on different week will affect the different impact on the increase of inflow and outflow.
The first step in forecast using VARX, GSTARX, and hybrid GSTARX-NN is modeling the data by time series regression with estimation method used is GLS, then modeling its residual using VAR, GSTAR and hybrid GSTAR-NN. The identification, estimation and diagnostic check steps show that the best models for forecasting inflow and outflow data are VAR(1) and GSTAR(11) with weight or W based on an inverse of the distance between locations. Furthermore, the hybrid GSTARX-FFNN and GSTARX-DLNN architecture are developed based on these GSTAR (11) model. So, the input used in the hybrid GSTAR-FFNN and GSTAR-DLNN model is GSTAR (11) variable.
The number of neurons is determined by applying cross validation method by experimenting the number of neurons are 1, 2, 3, 4, 5, 10, and 15. While the second hidden layer for GSTAR-DLNN, the number of neurons that be tested are 1, 2, 3, 4, and 5. The activation function in hidden layer is hyperbolic tangent, while the activation function for output layer is linear. The optimal selected neuron is the neuron that produces the smallest error rate in the data testing. The result shows that the best architecture for modeling inflow data using hybrid GSTARX-FFNN uses 2 neurons on the hidden layer while on modeling outflow data uses 15 neurons on the hidden layer (the architecture can be seen in Figure 9). Modeling inflow data using GSTARX-DLNN, the best architecture uses 15 neurons on the first hidden layer and 4 neurons on the second hidden layer, while on modeling outflow data uses 15 neurons on the first hidden layer and 4 neurons on the second hidden layer (Figure 10).

Comparison of VARX, GSTARX and Hybrid GSTARX-NN
The comparison of forecast value with the actual value for testing data can be seen in Table 3 for inflow data and Table 4 for outflow data. The ratio value that greater than one indicates that GSTARX-FFNN are better than the other methods based on RMSE criteria. Table 3 shows that the GSTARX-FFNN model is the best model for forecasting inflow data in Malang, Kediri, and Jember while the GSTARX is the best model for forecasting inflow data in Surabaya.   Table 4 shows that the GSTARX-FFNN model is the best model for forecasting outflow data in Kediri, GSTARX is the best model for forecasting outflow data in Surabaya, and GSTARX-DLNN is the best model for forecasting outflow data in both Malang and Jember. Furthermore, it could be concluded that in general hybrid GSTARX-NN give more accurate forecast than GSTARX and VARX as linear models. These results are in line with other previous studies that concluded the hybrid models tend to provide more accurate forecast than individual methods [11][12][13][14][15].

Conclusion
The proposed hybrid GSTARX-NN model is the two stages model for space-time data with calendar variation effect, i.e. the first stage is GSTARX model for eliminating trend, seasonal, and calendar variation effect using time series regression approach, and the residual of this model is modelled by using GSTAR-NN in the second stage. The results of simulation study showed that the hybrid GSTARX-NN model could capture the data patterns, i.e. trend, seasonal, calendar variation, and both linear or nonlinear noise series. Moreover, the results for applications study in inflow data showed that the hybrid GSTARX-FFNN is the best model in three locations. Whereas, the applications study in outflow data showed that the hybrid GSTARX-DLNN is the best model in two locations. Hence, it can be concluded that the hybrid GSTARX-NN models, i.e. GSTARX-FFNN and GSTARX-DLNN, tend to give more accurate forecast in modeling inflow and outflow data in Bank Indonesia at East Java region. Additionally, these results in line with other previous studies that stated the hybrid models tend to provide more accurate forecast than individual methods. Finally, further research focusing on hybrid GSTARX-DLNN could be done by involving other deep learning methods, particularly Recurrent Neural Network that could handle nonlinear moving average component.