Crude Oil Price Forecasting Using Long Short-Term Memory

Crude oil has an important role in the financial indicators of global markets and economies. The price of crude oil influences the income of a country, both directly and indirectly. This includes affecting the prices of basic needs, transportation, commodities, and many more. Therefore, understanding the future price of crude oil is essential in helping to budgeting and planning for a better economy. The contribution of this research is in finding the best hyperparameters and using early stopping methods in the LSTM model to predict oil prices. This research implemented Long Short-Term Memory (LSTM), an artificial neural network that can handle long-term dependencies and the problems of time series data. The LSTM method will be used to predict Brent oil prices on daily and weekly time frames. The experiment has been conducted by tuning some parameters to obtain the best result. From the daily time frame experiment, the model obtained RMSE and MAE of 1.27055 and 0.92827, respectively, while the weekly time frame has RMSE and MAE of 3.37817 and 2.60603, respectively. The results show that the LSTM model can improve to the trends that occur in the original data.


INTRODUCTION
Crude oil is the most widely used energy source in both industry and the economy, providing about 33% of the world's total energy consumption in 2019, according to the IEA (International Energy Agency) [1] [2]. Despite renewable energy being the focus in the industrial and technological areas at present, the costs involved are still quite high. This is due to limitations on the availability of technology that supports renewable energy. Moreover, machines and equipment that use renewable energy are still on a small scale, thus limiting the use of renewable energy, and the efficiency of renewable energy still lacks because it is still not capable of transportation, storage, and technology to achieve it. Until now, crude oil still has the biggest share of the world's main energy source consumption [3] [4].
As the main energy source, crude oil is very important because almost all industrial fields in every country have a dependency on crude oil in carrying out their activities for production and transportation [5] [6]. A sudden increase in crude oil price can have direct or indirect impacts on economic growth, when the price of energy sources increases, demand will fall, this will cause employment and GDP (Gross Domestic Product) growth to decrease, thus increasing the inflation rate [7][8] [9]. Various factors are affecting the instability in crude oil prices. In the long-term trend, the price is influenced by supply and demand, and the short-term trend is influenced by economic factors [10]. Therefore, an accurate prediction of crude oil prices is a big and important challenge because it can help make better budgeting and economic planning, especially during the COVID-19 pandemic. The price of crude oil during the COVID-19 outbreak has experienced a significant decline which makes the oil market very volatile. This makes predictions on oil prices very important, which can help monitor market movements to avoid suffering heavy losses [11] [12].
In recent years deep learning has become a popular field that helps in performing classification and forecasting [13]. Several papers have proposed the use of the deep learning methodology in predicting timeseries data. In the study [14], conducted forecasting of crude oil Brent using the LSTM method. The results show that LSTM produces good performance with an RMSE value of 1.91 and 2.82 for the training and testing set, respectively. The study [15] proposed LSTM to predict the stock price with the amount of data for various time frames. The result is that LSTM provides a better RMSE value with more data. For three years, the average RMSE value is around 0.9. In the study [16], they are comparing the performance of the Autoregressive integrated moving average (ARIMA) method with LSTM to predicting time-series data. This research resulted in an error reduction of 85% using LSTM compared to ARIMA. The study [17] conducted a comparison between LSTM with several methods, namely LSTM, Autoregressive Moving Average (ARMA), Artificial Neural Network (ANN), RNN, Decline Analysis, ARIMA, in making predictions for time-series data. The obtained results of this study indicate that LSTM produces better performance with an RMSE value of 1.74.
Thus, based on the deep learning performance in the previous study, this research will use the LSTM method to make a prediction based on the daily and weekly time frame. Both datasets will help to analyze the result of the experiment on the oil price that has fluctuations and instability in the oil market. We choose daily data because it is easier to analyze the pressure on the prices and weekly for a bigger picture of a longer time frame. LSTM was created to handle the problem of vanishing gradient on Recurrent Neural Networks (RNN) [18]. LSTM was chosen because it can capture and extract historical information and predict the future on the problem of long-term dependencies [17].
The contribution of our research is the implementation of LSTM using the best hyperparameters and early stopping methods to further improve the performance of the LSTM model to predict crude oil prices. This research will focus on the application of the LSTM method using the best hyperparameters. The model will be built based on observations of previous research, but our model will use the early stopping method and then tune the hyperparameters. The early stopping method will help train the model without capturing excessive data noise, and fluctuations and the hyperparameters tuning will result in the model performing better than the previous research.

RESEARCH METHOD 2.1. Dataset
One of the most important keys in conducting research is data preparation and collection. Deep learning methods require a larger volume of data. Therefore, the greater the volume of data, the better the results. The dataset to be used is historical data on the price of Brent oil from 1987 to 2021. Brent is the world's most widely used oil benchmark. This is because Brent oil is easily processed into products such as gasoline, so that the demand for Brent oil remains consistent. The source of the dataset was obtained from the U.S. Energy Information Administration (EIA) and can be downloaded at www.eia.gov.
The dataset consists of two attributes which are date and Brent oil price, as shown in Table 1. The value to be used for the Brent oil price is US Dollar. Each data will represent the price of Brent oil per barrel on its specific date. In this research, the dataset will be taken from daily and weekly time frames of Brent oil prices. A total of 8601 data will be used from the daily time frame dataset, and 1762 data will be used from the weekly time frame dataset.

Long Short-Term Memory (LSTM)
LSTM is an improved model of recurrent neural network (RNN) designed to avoid vanishing gradient problems and able to learn long as well as short term dependencies [19] [20]. LSTM was developed by Sepp Hochreiter and Jürgen Schmidhuber in 1997 [21]. LSTM improves the performance of RNN, which has difficulty dealing with vanishing gradients which will render the system inefficient when the long-term context is required and for tasks involving long-term dependencies.
The RNN use Back Propagation Time algorithm which by the time goes by it will cause vanishing gradient and makes it unsuitable for dealing with long term dependencies problem [18] [22]. LSTM is able to capture and learn short-term and long-term information on time series data, so it is suitable for predicting and processing time series problems. It replaces hidden nodes in the RNN model with memory units so it can avoid the vanishing gradient. The main key of LSTM is cell state and gate in handling vanishing gradients.
Its architecture is shown in Fig. 1, composed of forget gate, input gate, output gate, wise point multiplication, sigmoid layer, tanh layers, and cell state operations. This will control the flow of information that will enter and be remembered in memory of which will not be remembered and thrown away from memory [23] [24]. [20] In the forget gate shown in (1), the current input and previous hidden state will pass through the sigmoid function, and it would decide to be kept or thrown away the fraction of the information from the previous cell state if it is useless.

Fig 1. Long Short-Term Memory Unit Structure
Equation (2) and (3) is a formula for input and output gates. The input gate is used to update the cell state. The output gate determines the information that will be generated for the next hidden state from the current cell state.
In equation (4) is the formula for the hidden state "candidate", which calculates the previous and current value of the hidden state.
Equation (5) and (6) showed the formula for the cell state and hidden state, where * show the element of multiplication, while equation (7) and (8) is the formula for sigmoid and tanh function [25] [26]. In equation (1) to equation (8), the symbol of , , , , ℎ, are stand for LSTM unit, time, weight, bias, hidden state, and cell state respectively.

Adaptive Moment Estimation (Adam)
Adam is a method that computes individual adaptive learning rates for each parameter. It was first proposed in 2015 by Diederik P. Kingma and Jimmy Lei Ba. It is an optimizer algorithm that is used to minimize the loss function and determine the weight and bias values [27]. Adam keeps the average of past quadratic gradients that decay is exponentially shown in (9), and an average that decreases exponentially from its past gradient Similar to momentum shown in (10) [28].
In equation (11), the default value of 0.9 for β 1 , 0.999 for β 2 , 10-8 for , and 0.001 for η. Adam combines the advantages of two recently popular optimization methods AdaGrad and RMSProp. Overall, Adam proved resilient and a great fit for a variety of non-convex optimization problems in field machine learning [29].

Data Normalization
The data normalization method will produce a high quality of data that can feed into any learning algorithm. The time-series data can have a wide range of values, so it needs to be scaled to the same range of values to speed up the learning process [30].
This research will use Min-max scaling for the data normalization technique. This method converts each value in the dataset into a value in the range 0 to 1. The purpose of normalizing the data is to equalize the range of values in the attribute and avoid data becoming less influential due to the large difference in the range of values, as shown in (12) where x is normalized data while is the data that you want to be normalized and are minimum and maximum values of the data [31].

Evaluation
The evaluation performance of the LSTM model will be measured using performance metrics Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Both RMSE and MAE have been widely used as a standard statistical metric in measuring the performance of a model so that the model error can be found. They are widely used in climatological modeling, forecasting, and regression analysis in verifying the results obtained from the modeling. The closer to 0 the value, the better the modeling results obtained [31] [32].
RMSE measures the average error of the model and relatively gives a large weight error since the error are squared before they are averaged. While MAE is weighted equally for each individual for the average because it is a linear scoring rule. Equations (13) and (14) are the formula of MAE and RMSE where , Ã , are the amount of data, the predicted data, and the actual data, respectively [31][32].

Model Development
This research experiment is based on the LSTM model to predict Brent oil price. First, the data will be sorted by converting the data for the date into an object. The full system design is shown in Fig 2. In Fig. 2, the dataset is transformed using the min-max normalization method by processing the minimum and maximum values of each attribute. The range used in this method is 0 to 1. After the normalization process, the data will be split into train data and test data with a ratio of 75:25. The less of the training data ratio gives, the greater the parameter variance, and the less testing data, the greater the performance variance. Every case has a different ideal ratio. In our case, we have tried several ratios to get the best result 80:20, 75:25, and 70:30. The result is a ratio of 75:25 has the best performance among other ratios. In the training process, the LSTM algorithm will be executed by fitting the input value of train data with different hyper parameters shown in Table 2. Optimization is also implemented in this process using Adam optimization with initial default values of Adam's parameters. We will use 30 percent of the training data as validation data to see if the model performance is a good fit. The LSTM model will use one hidden layer with different parameters for the experiment to get the best result. This model will be built with the early stopping method that automatically specifies the number of training epochs and will stop the training process if the performance of the model does not experience further improvement on data validation, so decides the number of epochs is not necessary. Early stopping will be set to monitor the validation loss and set to a minimum. But the first time there is no improvement in the performance of the model is not necessarily mean the best time to stop the training. It may be that the model gets worse before it gets better again, even better than before. A trigger will be added to the model to overcome this problem, which will pay attention to the validation loss and epoch by adding the argument "patience" to the parameter. The value will be varied between datasets; in this research, the daily time frame will be set to 85 while the weekly time frame set to 0. After the training process, the model will be used to predict the price of Brent oil. The testing process will be conducted using test data from daily and weekly time frames. The result of this process is the prediction of crude oil prices. Before the evaluating process, the prediction result will be normalized back to the initial value. Then the result will be evaluated with Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).

RESULTS AND DISCUSSION
In this experiment, two measurements will be carried out, first using a daily time frame and the second using a weekly time frame. This experiment was carried out to see the performance results of the LSTM model that had been built based on the observations from the results of experiments. In the development model, the value of Brent oil is initially normalized using a min-max scaler. In the prediction results, the value of Brent oil will be normalized back to the initial value so that it matches the original value of the dataset and makes analysis easier. RMSE and MAE will be used to check the error value of model performance for each experiment that has been carried out. The two calculation errors will calculate the error value in both the training data and the test data. The lower the error value, the more accurate the accuracy of the model has to the original data.

Daily Time Frame
The model used different parameters to find the best result. The resulting value of RMSE and MAE varies in each parameter, as shown in Table 3. The best results in the daily time frame were found using 50 LSTM units, 2 looks back, 104 batch size, and 0.05 dropout.
The model used early stopping with parameter patience to stop training when validation loss has not improved. The value is obtained from the validation loss graph where the model experiences an increase in validation loss then decreases, and based on several experiments carried out, we choose 85 as the best value for the patience, and the best result stops the training at 432 epochs.  Table 3, the value of RMSE and MAE for each experiment are fairly accurate. The best prediction result has an RMSE score of 0.59377 for the train and 1.27055 for the test, while MAE has a value of 0.39722 for the train and 0.92827 for the test. Using several different hyper parameters helps in finding and analyzing experiments for accurate results. It also provides insight into how different parameters react to the dataset.
Increasing the number of LSTM units from 10 to 50 gives an improvement in the result with the same look back and batch size. This increases the width of the model so it can capture more information in a better way. Increasing the look back from 2 to 4 didn't give any improvement while decreasing it gives a better result. The same happens with batch size. A smaller batch size gives better results than a larger batch size. Fig. 3 shows the graph plot between the actual value and the best prediction results. It can be seen the model can adjust to the trend and the result of the prediction is not far from the actual data. With the right parameters, the model is able to capture all changes in trends and patterns that occur in the training process without capturing unnecessary noise and random fluctuations, so that provides good predictive results.

Weekly Time Frame
In the weekly time frame, the model will use the same parameters as the daily model, but the parameter "patience" will be set to 0 instead of 85 to stop training when validation loss has not improved. It was found the graphic of validation loss will not improve much with higher patience and will capture more noise and random fluctuation. The results from the experiments from the weekly time frame are shown in Table 4. The best results in the weekly time frame were found using 10 LSTM units, 2 looks back, 104 batch size, and 0.05 dropout with the RMSE of 1.36514 for the train and 3.37817 for the test, while MAE has 0.97079 for train and 2.60603 for the test. Based on Table 4, the RMSE and MAE values for each experiment in the training data are fairly accurate, while on the test data, the results are still quite far from accurate. Increasing the number of LSTM units from 10 to 50 didn't give any improvement in the result while decreasing it improve the performance. The same happens when increasing the look back and batch size. Decreasing the look back and batch size gives a better result to the LSTM model.
It can be inferred that the model is memorizing the data it has seen but unable to generalize to unseen examples. The graph in Fig. 4 shows a plot of the actual value and the predicted result over the weekly time frame. This result can be caused by the amount of training that is not enough for the model or the noise and the fluctuations in training that occur randomly, which is then studied as a concept by the model, which hurts its ability to generalize the data. Our state-of-the-art LSTM model using the early stopping method in this experiment proves to give better prediction results than the previous study that manually determined the number of epochs. Based on the results of the experiments that have been conducted, the model that has been built can study the data well and provide fairly accurate prediction results compared to the original data. The early stopping method will help the model to automatically determine how many epochs to use for training by monitoring the validation loss value. The model will stop the training process if there is an increase in the validation loss value, so the best model is for MAE. Our model has an error reduction of 45% and 52% in RMSE and MAE, respectively, compared to other models. Our study allowed the model to determine how many epoch numbers to used using an early stopping method, and in our best result, the early stopping method stops the model at 432 epochs while the previous study manually assigned the epoch number to 50. Our model consists of only one model layer compared to the 2 model layers used by other studies, we also compare our weekly time frame result, and our model has 36% and 38% error reduction in RMSE and MAE. Respectively the complete comparison is shown in Table 5. Based on Table 5, it can be concluded that more layers do not necessarily give better results, adding more units in the model can help the model to capture and extract more information of high-frequency data and the use of early stopping method to help determine the number of epochs in the model has a positive impact on our prediction results compare to manually selected number of epochs.

CONCLUSION
Determining the future of the crude oil price is important since it has an important role in the financial indicator of the world economy today. Having accurate predictions can give a country an advantage in economic planning. In this research, we use the LSTM model and early stopping method to predict Brent oil prices using various parameters to get the best result. We will use the daily and weekly time frame of Brent oil prices for the dataset. Based on the experiments, our model with the early stopping method gives better results than the model without the early stopping method. It helps the model get the best training results without capturing unnecessary noise and fluctuation.
The best result obtained from the daily time frame experiment has RMSE and MAE of 1.27055 and 0.92827, respectively, using a single LSTM layer, 50 LSTM units, 2 looks back, 104 batch size, and 0.05 dropout as hyperparameters. The weekly time frame experiment has RMSE and MAE of 3.37817 and 2.60603, respectively, using a single LSTM layer, 10 LSTM units, 2 looks back, 104 batch size, and 0.05 dropout as hyperparameters. With the early stopping method, the model stops the training at 432 epochs and 75 epochs for daily and weekly time frames. In the daily time frame increasing the LSTM unit gives a better result, while increasing look back and batch size didn't give much improvement. In the weekly time frame, smaller LSTM unit, look back, and batch size gives a better result.
The result shows that LSTM can predict the trend of future data quite accurately for the daily time frame compared to the weekly time frame. In the daily time frame, the prediction result is not far from the actual data, and the LSTM model can adjust to the trend of the actual value. In the weekly time frame, the model has less accurate results. It may be caused by the amount of data being less in comparison to the daily data, although it is not an absolute answer. It is hoped that the results of this research can be one of the tools to help monitor price movements of crude oil which often fluctuates, especially during the COVID-19 pandemic that can create an unprecedented level of risk, where making predictions is very important to avoid suffering large losses in a short time because of the oil price shock. In the future, adding more factors that influence the crude oil price could help to better predict the value and improve the accuracy of the model.