The prediction of mobile data traffic based on the ARIMA model and disruptive formula in industry 4.0: A case study in Jakarta, Indonesia

Disruptive technologies, which are caused by the cellular evolution including the Internet of Things (IoT), have significantly contributed data traffic to the mobile telecommunication network in the era of Industry 4.0. These technologies cause erroneous predictions prompting mobile operators to upgrade their network, which leads to revenue loss. Besides, the inaccuracy of network prediction also creates a bottleneck problem that affects the performance of the telecommunication network, especially on the mobile backhaul. We propose a new technique to predict more accurate data traffic. This research used a univariate Autoregressive Integrated Moving Average (ARIMA) model combined with a new disruptive formula. Another model, called a disruptive formula, uses a judgmental approach based on four variables: Political, Economic, Social, Technological (PEST), cost, time to market, and market share. The disruptive formula amplifies the ARIMA calculation as a new combination formula from the judgmental and statistical approach. The results show that the disruptive formula combined with the ARIMA model has a low error in mobile data forecasting compared to the conventional ARIMA. The conventional ARIMA shows the average mobile data traffic to be 49.19 Mb/s and 156.93 Mb/s for the 3G and 4G, respectively; whereas the ARIMA with disruptive formula shows more optimized traffic, reaching 56.72 Mb/s and 199.73 Mb/s. The higher values in the ARIMA with disruptive formula are closest to the prediction of the mobile data forecast. This result suggests that the combination of statistical and computational approach provide more accurate prediction method for the mobile backhaul networks.


INTRODUCTION
The total mobile data traffic generated by telecommunication technologies has been significantly contributing to the core network in recent years. This has led to a congestion problem, especially in mobile backhaul technologies, which play a significant role in bringing traffic to the core network. If the mobile backhaul is congested, the operator performance may return a packet drop or higher latency, where affecting the end user indirectly. Besides, as the upcoming Industry 4.0 has already been introduced in several countries, mobile

FORECASTING METHODS
Several techniques have been proposed to predict technological disruptiveness, especially in Industry 4.0. The first technique is called Forward-Citation Node Pair Algorithm (FCNA), which was introduced by Changwoo Choi and Yongtae Park [16], and uses a patent-citation matrix consisting of a set of nodes connected by arcs. This technique aims to identify the main development path of the complex patent-citation by understanding both present and past technologies [17], where the leading technology possesses the main patents that are linked to the selected arcs. The other technique, to improve FCNA, is K-core analysis, which concentrates on the sub-groups' nodes rather than on the main patents [17]. This technique aims to remove the central patents (which are assumed to be the most disruptive technologies) by distributing them into different subgroups that can help identify the essential data [18]. The last technique to identify disruptiveness in Industry 4.0 is called topic modelling and is similar to a search engine. Topic modelling is mostly used in the cluster that has been defined in the K-core analysis. The highest number of repetitive words in each cluster defined in K-core analysis leads to the most important aspects of the technology. Afterward, to validate the results raised by this technique, it is recommended that two experts review them [17]. All these techniques are mostly aimed at identifying the most important disruptive technologies in the market through clustering. Such techniques help market leaders identify which technologies are more disruptive, but they do not determine how much traffic each will contribute. Based on the analysis of these techniques, the major contribution technologies are IoT, artificial intelligence, financial technology (including blockchain), virtual reality, and autonomous vehicles.

Types of forecasts
Judgmental methods are based on intuition, personal interest, and user experiences [19]. One example of a judgmental method is the Delphi method, which employs a panel of experts to analyze research results to ensure validity. A judgmental method will also be used in this prediction to analyze the disruptive formula using a risk-factor technique. Univariate methods depend on past and present values that have been forecasted in a single series [19]. Univariate methods are used in this research to analyze the predicted traffic forecast. The ARIMA model consists of both univariate and multivariate models. Multivariate models use more than one independent variable (time series) simultaneously to predict the forecast. These variables might comprise interrelationships using a different time variable.

The ARIMA model
ARIMA stands for Autoregressive Integrated Moving Average [19]. The ARIMA model is based on the Box and Jenkins method of using three different concepts: Autoregression (AR), Moving Average (MA), and integration, together classified as an ARIMA(p, d, q); p defines the AR; d defines the differential; and q defines the MA. AR is a technique for analyzing the past and present values of a data set. AR is denoted as p, where it shows the weighted linear of sum p values based on ARIMA (p, d, q) terminology. The p value indicates the number of order. The formula to denote this AR is shown in (1): where p is used to determine the number of orders of past values; t is the time series; Θ is the slope coefficient of the weighted past values; and y is the time-series function of the ARIMA model. The error term is normally distributed with mean zero and variance 2 . The MA process is denoted by order q in the ARIMA (p, d, q) classification, which shows an error value in (1). The error term is normally distributed with mean zero and variance 2 . MA also uses the number of orders in the past values, as denoted in (2): where t is the time series; θ is the slope coefficient of the weighted past value; θ is the number of orders needed to identify the past values; and y is the time-series function of the ARIMA model. To identify how many orders are in the calculation of AR, the parameter of q is used. MA has been used for stock trading. MA aims to eliminate the noises or peaks from random noise fluctuations in the graph, where it leads to the erroneous prediction. To calculate the average value in the chart, MA takes a certain period, such as seven values, to calculate the average as shown in Figure 1. The results of A, B, and C are 1, 2.6, and 2.6, respectively. The average A shares the same average value with each value in the sequence, which is one. Due to inconsistent sequences in series six, the averages B and C have different values compared to A. However, instead of using an original sequence value that has a significant peak, among others, the MA shows a smooth graph. Otherwise, Integrated or differentiated versions are denoted as d in ARIMA (p, d, q), which is defined as the parameter that checks whether the graph is stationary [20]. As a best practice case, the time-series graph is mostly non-stationary. Therefore, MA and AR are not sufficient to determine the prediction. Being non-stationary might cause problems that can lead to error prediction. Therefore, a differential or integrated model is one of the best techniques available to make graphs stationary [19].

TRAFFIC FORECASTING FORMULATION
The ARIMA model is a statistical model that predicts the forecast based on past and present values. The statistical model might be inaccurate if new technologies or trends are affecting the graph, making them more disruptive. Since a judgmental approach can predict the future more accurately [13], this paper proposes a new closest prediction approach to identify traffic using a combination of judgmental and statistical approaches. To use both approaches, three important procedures must be completed: Analysis of the data set for 3G and LTE traffic using the ARIMA model; generation of a disruptive formula based on environmental technologies in the particular country; validate the disruptive formula combined with the ARIMA model.

Determining the data set
The data set was obtained from one of the busiest traffic site in Indonesia where 2G, 3G, and LTE have been installed. We took an example from the capital city of Indonesia, Jakarta, where LTE was installed in the first month of 2017. This site is strategically located in the capital city, where internet penetration is relatively greater than other locations. Besides, this site also represents the possibility to implement the IoT and the 5G network. LTE and 3G mobile data traffic data were collected for one year. Because of mobile operators have not invested in 2G technology anymore, this was assumed to be stagnant and excluded in the ARIMA and disruptive calculations. Based on the Cisco Visual Networking Index, the penetration of 2G devices has been decreasing gradually over the last five years [21]. Therefore, we assumed that 2G mobile traffic consumes a maximum of 2 Mb/s at each site based on maximum capacity, where traffic remains stable. IoT sensors were also excluded in the calculation because Indonesia just will launch the Narrow-band IoT License in Indonesia in the latest 2018, where IoT just officially starting in the next year. . However, it is predicted that 400,000 IoT sensors might be installed by 2022 [22]. This trend is similar to that for 5G systems, which are expected to expand by 2020, especially in Indonesia [23]. Therefore, this research made several assumptions about these technologies (2G, 5G, and IoT), whereas 3G and LTE were more clearly demonstrated.

A new disruptive formula
The peak data rate of LTE and 3G, which are 300 Mb/s downlink and 63 Mb/s downlink (with 3rd carriers) respectively, have not shown a real fact that full throughput is used. Even 4G and 3G have 300 Mb/s and 63 Mb/s in the downlink, only approximately 20% of the full throughput would be used [24]. As a result, traffic is unpredictable and might halt one day, depending on human behaviour and company profiles. This paper proposes a new modification to the ARIMA formula. A new disruptive formula was defined as a judgemental method that might involve experts to identify the results. The combination of the judgemental and analytical methods can improve the accuracy of the prediction [13]. We proposed a new formulation, to predict disruptiveness in the future: where D depends on four variables: time to market (TTM), Cost, Politics, Economics, Social, and Technological (PEST), and Market Share. The value of disruptiveness ranges by 0 ≤ D ≤ 1. The value of two and the disruptiveness range were inspired by the Global mobile data traffic forecast in [28], where it defines data traffic which never reached more than two times compared to the previous year every year; this motivated the creation of Equation (3) to define the D formula, which applies to both mobile backhaul and mobile backbone traffic. In this paper, we proposed an ARIMA model as a legacy formula for modification. As a consequence, the results obtained from the ARIMA model were amplified using the disruptiveness formula based on four variables. As a result, the result of the time-series function returns Equation (3), which is based on the disruptiveness and legacy formula.
We can immediately see that the Global mobile data traffic forecast until 2021 increases gradually each year [25]. The average increase over five years is around 47.4%. We extract the incremental traffic each year as illustrated in Table 1. The incremental traffic each year motivated us to analyze more deeply how to determine the disruptive formula. Table 1 shows that the mobile traffic trend never reached around two times compared to the previous year, which was later defined as the maximum disruptiveness. However, due to unpredicted technologies, mobile data traffic might not increase at all. Therefore, the range of disruptiveness values applies from 0 to 1, where 1 defines the maximum disruptiveness by two times the formula, and 0 defines the minimum disruptiveness, which remains the same. The disruptiveness value is a judgemental method based on current behaviour represented in the TTM, cost, PEST, and market share variables. These variables might affect the disruptiveness value in the formula. TTM is the period during which a product has been agreed upon and resources have been committed to a project. The TTM is divided into two variables: impact and probability. The length of the TTM gives it flexibility to decrease and increase significantly, depending on time-related processes [26]. The simpler the products or services provided, the shorter the TTM will be. The impact of the implementation products/services and the probability that it will be developed affect the value of the TTM. Table 2 shows how scoring the TTM to define the disruptiveness formula. Cost defines the total cost, including the variable and fixed costs and even operational and capital costs. The main general cost discussed here is the amount needed to create a new project [27]. The cost is divided into two different variables: impact and probability. The cost variable's value ranges from 0 to 1; its definition is provided in Table 2. PEST is considered in this disruptive technique to determine the performance and activities of businesses, especially in the long term [28]. It is clear that PEST might affect technology, especially when the technology is legalized. For example, a new technology might not be implemented in a new project if the relevant authorities do not allow it; this will in turn affect the implementation of the new technology. Table 2 shows the PEST correlation in this variable which is divided by two; PEST Impact and PEST Probability. As with TTM and cost, this PEST value ranges from 0 to 1. Market Share variable represents the percentage of market share of an industry markets total sales over a certain period. The market share is very important to determining the level of competitiveness among competitors. Table 2 shows that the market share was divided into two variables: market share impact and market share probability. The market share ranges from 0 to 1; each definition is shown in Table 2.
The formula used to identify disruptiveness incorporates the four variables as shown in Table 2. These four variables were used to identify the value of the disruptiveness, as shown in (3). Each variable has its own weight which affect the disruptiveness formula in (4).
The range of the disruptiveness value is defined in Table 2. Each variable has its own weight or priority, as shown in Table 2. The priority of every variable such as Market Share, PEST, TTM, and Cost have different values as shown in Table 2, which will affect the formula of disruptiveness in (4). It is assumed that the variable of Cost leads as a first priority to affect disruptiveness, whereas market share has the lowest priority. The first or last priority identify the weight value in each variable which leads to the final formula of disruptiveness, as expressed in (4). The percentage of market share to lead others Score 0 The LOW impact of market share on subscribers The percentage of market share to lead others

Forecasting error management
The analysis of this formula compares the global mobile data forecast prediction with the model calculation used in this paper. The disruptive formula was analyzed using a percentage error comparing between conventional ARIMA and ARIMA with disruptive formula, which are defined as follows: where x refers to the Global mobile data traffic forecast , and y refers to the type of model used-in this case, the conventional ARIMA and the ARIMA with disruptive formula. The percentage error defined the significance of an error compared to the mobile data traffic, where the lowest error rate led to better performance in deciding the forecast value.

3G AND 4G FORECASTING
The formula of disruptiveness combined with ARIMA has been defined. This section will explain two main analyses: 3G and 4G forecasting. Based on the mobile data traffic data set in Figure 2 Based on the ARIMA calculation, defined in the black line in Figure 2, the prediction of this traffic shows a small decrease in the first month, after which it remained stable until the end of March 2019. The ARIMA calculation used an ordered ARIMA (2,1,3), where it defined order two for AR, one for differentiating, and three for MA. The order calculation is calculated by data analysis tools, which is called R studio, to support predictive analysis using a Akaike Information Criterion (AIC).
Based on the combined ARIMA with a disruptiveness formula, as shown in the dotted line in Figure  2, the graph shows a significant value on ARIMA with a disruptive formula compared to the conventional ARIMA. The disruptiveness value was based on the cost, TTM, PEST, and market share variables, which are defined in Table 3. Based on the results in Figure 2, there was a difference in traffic of around 8 Mb/s over the year. We conclude that from 2018 to 2019, based on Table 3, the PEST and cost variables, especially with respect to probability, showed a significant value, which reached 0.8 from 1.0. This might have caused the 3G mobile network penetration, which is still promised for technology implementation across the Indonesian islands. In fact, Base Station (BS) are still unavailable on several islands especially rural areas. Therefore, 3G might be preferred for application compared to other technologies regarding PEST and cost probability. However, PEST might not affect the overall disruptiveness values much, since it is a third priority, after cost and TTM.
The TTM variable illustrated in Table 3 shows a small value of 0.2 from 1.0, which was caused by the 3G network trend in 2019. Since mobile network trends are moving towards LTE and 5G networks in 2019 to support low-power devices, 3G might not be preferable for implementation in mobile traffic.  Based on ARIMA as shown in the black line in Figure 3, the prediction traffic shows a small decrease at first and then remains stable over a year. The ARIMA model used in this formula used ARIMA (1,1,1) based on the AIC calculation in R Studio. Based on ARIMA with disruptive formula in the dotted line from Figure 3, the effects of four variables introduced in Section IV.B show a significant contrast with the ARIMA formula. The difference in value between ARIMA with disruptive formula and the ARIMA model is approximately 40 Mb/s over the time series. During this year, the four variables mostly had the same average for the Impact and Probability.
In the 4G network, based on Table 3, the market share leads in 4G using a relative higher value compared to other variables, which is 0.7 for both probability and impact. This is mainly caused by 4G network penetration, which increased relatively from 2018 to 2019. The TTM Probability and cost Impact had a value of 0.7 from 1.0, since 4G is an affordable technology to support higher traffic. A higher cost for 4G might still be preferable if compared to the impact of this technology on users, where most people and some devices are using more data than in previous years. This might cause the cost impact to be higher than others.
The TTM Probability, showing 0.7 from 1.0 in Table 3, was caused by the impact of this technology as well. The behaviour of people in 2019 is expected to support digitalization technology that consumes more data traffic in the network. This will lead to shorter TTM to support the mobility devices in the network.

RESULT AND DISCUSSION
The simulation results using ARIMA model and an ARIMA combination with disruptive formula have been described in Figure 2 and Figure 3, respectively. The increasing traffic using a disruptiveness formula for 3G and 4G technologies significantly escalated the base value of ARIMA model by 8 Mb/s and 40 Mb/s, respectively. The four variables were deemed more promising for accurate prediction compared with using the ARIMA model. The ARIMA calculation, which was based on past and present values and MA, might generate inaccurate predictions if disruptive technologies are not considered. To assess this issue, this study utilised a percentage error that compared between conventional ARIMA and ARIMA with disruptive formula with the global mobile data traffic forecast.

Error performance
The Error performance subsection aims to compare the percentage error between two models in 3G and 4G traffic based in the results obtained in Section 4. Based on (5), the global data traffic variable used a global mobile data forecast [28], which was identified using a data set multiplied by the incremental value in Table 1 in the year 2018-2019. Additionally, the model calculation in (5) applied the conventional ARIMA and ARIMA with disruptive formula.
The global mobile data traffic calculated the average mobile data traffic obtained from the data set in 3G and 4G-in this case, from 2016 to 2018. The data set was averaged over 2016 to 2018 and multiplied using an incremental value based on Table 1, which was 41.17% from 2018-2019. By obtaining the data set traffic and the incremental value from this data, the optimized prediction traffic was obtained, which is shown in Table 4. Table 4 shows optimized traffic based on Mobile Data Forecast, where it increases 41.17% from 2018 to 2019. The data traffic in 2019 based on Table 4 will be compared to the prediction based on Conventional ARIMA and ARIMA + Disruptive Formula. It assumed that the global mobile data traffic forecast has more accurate prediction based on several types of research. In 3G, the ARIMA with disruptive formula reaches 56.67 Mb/s, which is almost optimized to 72.17 Mb/s in Table 4. The error rate seemed to be lower than in the conventional ARIMA. Besides, in 4G, the ARIMA with disruptive formula reaches 199.6 Mb/s, whereas conventional ARIMA reaches 156.93 Mb/s, which is far higher than the global mobile data traffic, 293.09 Mb/s in Table 4. The error rate in both values is caused by several factors, i.e.: backhaul traffic and developing country factors. The calculation of the ARIMA model used backhaul traffic, where it seemed to be more negative compared to the global data forecast.

TELKOMNIKA Telecommun Comput El Control
Ì 915 However, the lower error rate in ARIMA with disruptive formula is more promising compared to the conventional ARIMA, where it defined more optimized values from 2018 to 2019. This demonstrates that the ARIMA with disruptive formula had more accurate prediction compared to the conventional ARIMA. Internet penetration in developing countries has increased relatively slowly compared to developed countries, which might affect mobile data traffic. In this case, compared to global mobile data traffic, the conventional ARIMA shows a small increase over a year for 3G and 4G, as shown in Figures 3 and 4. However, the ARIMA with disruptive formula was more positive, which is closely related to the mobile data traffic trend and the internet implementation program across Indonesia. This led to more accurate prediction.

Variable analysis
The four factors that have been defined-TTM, Cost, PEST, and Market Share-significantly affected the conventional ARIMA. As explained earlier, the variable cost had maximum priority over others, while market share was less important. To validate the variables, we identified the different levels of each variable using the maximum incremental traffic of Impact and Probability. The maximum order of Impact and Probability, M , can be expressed, M = N × max(Impact × P robability).
where N is equal to the number of weight, which is identified in Table 2. For example, the variable cost has four maximum weights, where the variable market share has 1 maximum weight. Besides, Impact and Probability are assumed to be at maximum probability, which is equal to 1. It also assumes that the other variables are zero if a particular variable is calculated. Based on (6), Table 5 shows different maximum orders in each variable. Moreover, Figure 4 took an example in the 3G network, showing the different incremental traffic using the four variables identified in Table  5. The incremental traffic shows a different prediction was used in the ARIMA model.  Based on Figure 4 and Table 5, the variable cost will affect a maximum 16% incremental value compared to the conventional ARIMA. Cost is cross-related to the revenues of companies. This is reasonable since if we imagine that companies have a significant amount of revenues to utilize the initial and maintenance costs for Capital Expenditure (CAPEX) and Operational Expenditure (OPEX), they will prioritize customer demands, including data traffic speed and latency, which are essential for users. By this variable, in the Indonesian case, the operators have an opportunity to either spread the mobile base stations into different locations or to make regular networks denser to increase their capacity. The cost might correlate with other variables, such as PEST, TTM, and Market Share. If mobile operators have more revenues, this will affect disruptiveness and other variables. For example, the revenues will consider the frequency allocation that has been determined by the regulators in each particular country, where more revenues will probably decide more frequency allocations.

Ì
ISSN: 1693-6930 TTM and PEST are the second and third priorities in the disruptiveness variables. The maximum total incremental traffic is 16% and 9% , respectively, as shown in Figure 4. The example real-world case of this variable is the license readiness to implement new frequencies and technologies in the country. For example, the main challenge for the license readiness example is the millimetre wave in 5G technology, where several steps are needed to assess the frequency spectrum in their country. The new spectrum in millimetre wave should consider permission to open a new license spectrum. This license is also a dependent factor with the cost profile of the mobile operators, where having a new spectrum leads to higher risk of spending at more considerable cost.
Besides, technology readiness is also a dependent factor with the cost profile of mobile companies, where technology readiness might be delayed if traffic penetration in the country is not covered 100%. As an example, in Indonesia, the LTE mobile stations will be implemented later, since 3G stations have not yet been implemented across the whole of Indonesia. Therefore, to support more efficiency, 3G is still preferable to LTE, which reduces more CAPEX and OPEX, for new stations. Both cases might cause a delay in technology implementation for the country, where both variables still depend mostly on cost. As a result, cost is still the highest priority affecting mobile data traffic.
Market share, the lowest priority, determines only 1% of the incremental traffic. Market share does not affect mobile data traffic very much if traffic and revenues are relatively increasing inversely, which has occurred in the telecommunication environment. This makes market share the lowest priority in the disruptiveness value. To conclude, four variables are the main factors of disruptive traffic, with cost/revenues being the most dominant factors that affect disruptive traffic.

Backhaul analysis
The arrival of disruptive technologies will affect the total capacities in the mobile backhaul. As shown in Section 5, 4G and 3G traffic have amplified traffic in the conventional ARIMA, around 35 Mb/s and 10 Mb/s, respectively. Other technology, such as 2G systems, which is assumed to be stagnant, consumes around 2 Mb/s each year. Besides, the 5G network has not been implemented yet, and while IoT systems are increasing this year, we still assumed no IoT due to having the lowest bit rate and smallest total number of devices. Illustrating all these factors, Table 6 presents an overview of the predicted average mobile data traffic between 2017 and 2018, mainly in each site, using ARIMA with disruptive formula. By calculating the ARIMA with disruptive formula model, we conclude that the mobile backhaul could support the old microwave technologies, where the future backhaul will need at least the average of 258.2 Mb/s each site based on Table 6. This capacity is basically could be supported by the existing microwave technologies. However, to anticipate the unexpected traffic in the future, this paper recommends a list of feature for the mobile backhaul, which are (from the most efficient): Using HOM that supports traffic greater than 300 Mb/s, for instance, with 1024 or 2048 QAM, implementing more antennas in the mobile backhaul systems to increase capacity, such as MIMO or Massive MIMO, migrating old microwave technologies to fiber optics. This paper founds an effective and accurate way to predict the traffic forecast based on statistical and judgemental approach. With the combination of ARIMA model and disruptive formula that this paper proposed, it has shown that ARIMA is more accurate if it is associated with a Judgemental approach to correct the errors.

CONCLUSION
The major contribution of the study is the development of a new formula in the ARIMA model to predict forecast traffic based on four variables: TTM, cost, PEST, and market share. Our research confirms that disruptive technology affect the mobile data traffic if: the telecommunication companies are profitable; the time to market to implement new project is acceptable, the environment of PEST supports new technologies TELKOMNIKA Telecommun Comput El Control Ì 917 and mobilities, and market share has increased a new segment market. This combination also leads to the conclusion that mobile backhaul in Indonesia recommended using the existing microwave technologies in 2018. Future work will use short forecasts for the long-period method, such as the Long Short Term Memory (LSTM) network, where mobile traffic is predicted more accurately over a longer period.