Developing support vector regression model to forcast stock prices of mining companies in Indonesia

The capital market (capital market) is an organized financial system in which there are commercial banks and financial institutions as intermediaries for securities such as stocks, bonds, outstanding debt securities. In essence, the capital market means connecting parties with excess funds with those who need funds. As one of the national economies' potentials, capital market activities increasingly place its role in developing the national economy [1].


I. Introduction
The capital market (capital market) is an organized financial system in which there are commercial banks and financial institutions as intermediaries for securities such as stocks, bonds, outstanding debt securities. In essence, the capital market means connecting parties with excess funds with those who need funds. As one of the national economies' potentials, capital market activities increasingly place its role in developing the national economy [1].
Stocks are the most popular investment market instruments. Issuing shares is one strategy of a company or business entity to raise funding for the company [2]. In a dynamic and fluctuating movement, the stock investment can also cause a loss for investors. Stock price prediction is an analysis technique to determine future stock prices using historic stock prices in the past. Predictions can be made using several methods, but using the time series model is expected to produce excellent and optimal predictions. The characteristics of stock data are time-series data that move continuously with time [3].
The mining sector in Indonesia supports the national economy and national energy security, both in employment and foreign exchange earnings through exports. The mining sector is further divided into five sub-sectors, one of which is the coal mining sub-sector. The coal mining sub-sector currently accounts for 75 to 80 percent of the total Non-Tax State Revenue (PNBP) in the mineral sector.
SVM algorithm is an algorithm of one of the classification methods that can produce a learning process or learning, separated by a hyperplane line. One of the SVM modifications used for the regression approach is Support Vector Regression. The concept of SVR is to maximize the hyperplane to get vector support data [4]. SVM has been widely used for forecasting stock prices and shows better performance than other algorithms, including ANN. An ANN has already been widely used for forecasting processes, including a promising alternative to predicting stock prices, where ANN finds a solution in the form of local optimal.

AB S T R A C T
In the modern era, as it is now, the world of stock investment is in great demand by investors, both long-term and short-term stock investments. Stock investment provides many benefits for investors. Investors need to analyze stock investments to predict the shares' price to be purchased to get large profits. Very volatile stock price movements make it difficult for investors to predict stock prices. Investors' main hope is to benefit from each price that changes from time to time or can be referred to as time-series data. Data mining extracts considerable information from data by collecting, using data, historical patterns of data relationships, and relationships in large data sets. Support vector regression has advantages in making accurate stock price predictions and can overcome overfitting by itself. PTBA and ITMG are the leading coal mining companies in Indonesia, so many people want to invest in the company. ADRO, PTBA, and ITMG stock price prediction analysis using support vector regression algorithm has good predictive accuracy values, including. PTBA stock price has an R-square value of 97.9% in the RBF kernel and linear with MAPE, respectively, 2,465 and 2,480. Moreover, for ITMG stock price, it has an R-square accuracy of 94.3% in the RBF kernel and linear with MAPE, respectively 5.874 and 5.875. These results indicate that the SVR method is best used for forecasting stock prices. In contrast, SVM finds global optimal [5]. One of the advantages of using SVM can offer the global optimum solution. It can be analyzed theoretically using a concept from computational learning theory and achieve good performance at the same time [6].

A. Support Vector Regression
Support Vector Regression is a theory adapted from machine learning theory that has been used to solve classification problems, namely Support Vector Machine. SVR itself is an application of the SVM algorithm in the case of regression [7]. The SVR algorithm concept can produce a good forecasting value because SVR can solve overfitting problems [8].
The SVM method concept can be explained simply as a way to find the best hyperplane (Fig. 1). Hyperplane itself functions as a separator in two data classes in the input space. . It can be seen that there are several circled data points. Those are potential support vectors, or data points can become potential candidates so that all data points can be entered into one zone while minimizing epsilon value (ε).

Fig. 2. Illustration of Support Vector Regression
In SVR, which has a low dimension, it will be transformed into a linear regression with high dimensional features. The general form of support vector regression is in (1).
Where φ(x) is a function that maps x in a higher dimension, and b is a bias in a constant form. w T is a weighting vector. The coefficients w and b are estimated by minimizing the risk function.
The loss function is a function that shows the relationship between error, and how this error is charged, the difference in loss function will produce a different SVR formula [9].
Simple loss functions and ε-insensitive loss function as an approach to Huber's loss function that allows support vectors to be obtained [10]. The ε-insensitive loss function formula is in (2). The L condition can be defined in (3).
is called the ε-insensitive loss function, C and ε are prescribed parameters. Concept quadratic programming at equation (2) can be transformed by minimizing.
With a constant C> 0, determine the bargain (trade-off) between the thinness of the function f (x) and the upper limit of the deviation more significant than ε is still tolerated. All deviations greater than ε will be subject to penalties of C [11]. The optimal solution can be solved with the following Lagrange functions (4). (4) To get the optimal solution, you can do a partial derivative of Q with respect to w, b. From the equation above, w can be written as in (5).
Then the optimal hyperplane function is writer as in (6).

C. Grid Search Algorithm
Based on the statement for [12], cross-validation is a standardized test carried out to predict error rates. The training data is randomly divided into several parts with the same comparison. The error rate is calculated section by section. The average error rate is calculated to get the overall error rate in cross-validation, known as leave-one-out validation (LOO). In LOO, data is divided into two subsets; subset 1 contains N-1 training data and one remaining data for testing [9].

A. Model Evaluation of Support Vector Regression
The parameters used to form the PTBA and ITMG stock data model are linear kernel parameters and radial basis functions. Furthermore, this research's focus is on linear kernel parameters, radial basis functions with parameter C that is 10,100,1000 as tolerance vector support numbers to hyperplane. Gamma parameters for kernel radial basis functions are 0.1, 0.01, 0.001, 0.0001. The performance model that is formed is measured using R-square and MAPE's accuracy value; the more the R-square value approaches 1 (one), the better the model. However, the model must not be overfitting or prediction equal to the actual value. For MAPE to measure the model error in forecasting, the smaller the error value, the better the model formed.
Based on the analysis results using the support vector regression method with the distribution of training data by 80% and testing data by 20%, the accuracy values obtained are excellent (Table 1), with an average of more than 90 percent. Simultaneously, the MAPE or error values are still relatively high, so the model obtained is terrible because it is too high. A hyperparameter tunning is performed using the Grid Search algorithm to get optimal model performance for forecasting stock prices in the testing data. Based on the results of tuning parameters that have been done (Table 2), it is found that the performance of the model is optimal. For PTBA shares, the optimal model obtained is in the RBF kernel with an accuracy value of 97.8 percent with a MAPE value or an error of 2.16 percent with an optimal parameter C = 1000, gamma = 0.01. As for the ITMG stock, the optimal model formed is linear kernel and RBF with an accuracy value of 90.6 percent and MAPE value or error of 7.09 percent with the optimal parameter C = 1000, gamma = 0.001 for the RBF kernel, for the linear kernel the optimal parameter C = 10.  The result of the forecasting of PTBA (Fig. 4), and ITMG (Fig. 5) stock prices for the next ten periods. ITMG shares tended to increase while PTBA shares decrease.

V. Conclusion
The SVR method can be applied to forecast PTBA and ITMG stock prices. The optimal SVR model obtained for PTBA stock data with a radial kernel base function with parameters C = 1000 and gamma = 0.01 with an accuracy value of 97.8% and a MAPE value of 2.16. Then for ITMG stock data obtained, an optimal SVR model with a radial kernel base function with parameters C = 1000 and gamma = 0.001, linear kernels with parameter C = 10 with accuracy model of 940.6% in the testing data and a MAPE value of 7.09. Forecasting results of PTBA and ITMG stock prices for the next ten periods in ITMG shares tended to increase while those in PTBA shares experienced a decrease.