The step construction of penalized spline in electrical power load data

Electricity is one of the most pressing needs for human life. Electricity is required not only for lighting but also to carry out activities of daily life related to activities Social and economic community. The problems is currently a limited supply of electricity resulting in an energy crisis. Electrical power is not storable therefore it is a vital need to make a good electricity demand forecast. According to this, we conducted an analysis based on power load. Given a baseline to this research, we applied penalized splines (P-splines) which led to a powerful and applicable smoothing technique. In this paper, we revealed penalized spline degree 1 (linear) with 8 knots is the best model since it has the lowest GCV (Generelized Cross Validation). This model have become a compelling model to predict electric power load evidenced by of Mean Absolute Percentage Error (MAPE=0.013) less than 10%.


Introduction
Electricity is a basic need that cannot be separated from humans. Electricity is not only required for lighting, but also as a tool in conducting daily life activities related to socioeconomic society. With the electricity, teaching and learning activities, communication, transportation and health services and development process can run smoothly. So the Government is obliged to provide electricity with sufficient quantity and quality for its people. The availability of power grids in a region can determine progress and developments in the area. However, in reality, the provision of electrical energy by PT PLN (Persero) [1] as the official institution appointed by the government to manage the electricity problem in Indonesia, has not been able to meet the people's need for electrical energy as a whole. The geographical condition of the Indonesian state, which consists of thousands of islands dispersed and unevenly buried power centers. Weak electricity demand in some areas, the high cost of marginal development of electricity supply system and limited financial ability are inhibiting factors of energy supply Electricity on a national scale [2].
Given the importance of the availability of electric energy in an area, a precise method is needed to assess the total use of it in the future [3]. Daily total electricity usage is time series data, but the data does not meet stationary and white noise assumption so that the parametric regression is less appropriate to use [4]. To overcome this, we use nonparametric regression that does not require any assumptions [5]. One of the nonparametric regressions that can be used to model data is Spline Regression [6]. Spline regression is one approach to the commonly used nonparametric method because it gives excellent flexibility [7] to the 1024 characteristics of a function or data, and is capable of handling data character or function that is smooth. In spline regression, the best model determination by choosing the exact number and location of the knots [8]. This process usually takes a long time and if done using the software also requires a large memory [9]. To overcome this, we can use Penalized spline regression (P-Spline) [10] where this method utilizes the quantitative points of the unique (single) value of the predictor variable as the place of the knot [11] so as to produce better flexibility [12]. By using penalized spline regression, we will predict the total daily usage data in Sumatra. Predictive data is expected to be used as a reference for the parties concerned in determining the direction of the policy of the fulfillment of electrical energy is appropriate and following the request.

Research Method 2.1 Non Parametric Regression
Smoothing is one of the methods used in nonparametric data analysis [13]. The purpose of smoothing is to minimize the diversity and estimate the behavior of data that tends to be different and has no effect so that the characteristics of the data will appear more clearly [14]. Spline popular in Ecology [4] and biodiversity [15] One of the regression models with a nonparametric approach that can be used to estimate the regression curve is spline regression [16]. Spline regression is an approach to matching data while still taking into account the smoothing curve. The spline approach has its virtue because spline is a piecewise polynomial of order m that has a continuous segmented property that adequately describes the local characteristics of data functions [6] Spline regression with the orde of m and the knot point ..., can be explained in (1): Moreover, m as polynomial orde, as knot point to-k with k=1, 2,.., K and as error independent random assumed to be normal with mean zero and variance σ 2 [17]. Spline has an advantage in overcoming the pattern of data showing sharp ups and downs with the help of knots, and the resulting curve is relatively smooth. Knots are a common fusion that shows the behavioral changes of the spline function at different intervals [18]. One form of spline regression is the penalized spline obtained by minimizing Penalized Least Square (PLS), which is an estimation function that combines the least square function and smoothness of the curve. Penalized spline regression [19] is a popular smoothing approach [12] the penalized spline consists of the polynomial piecewise [20] that have a continuous segmental property [21]. This property provides better flexibility than ordinary polynomials making it possible to adopt functional or data characteristics efficiently.
The polynomial basis function in the penalized spline estimator is less capable of handling an arbitrary data and numerical instability when the number of large knot points and the smoothing parameter value (λ) is small or 0. To treat an arbitrary data and uncertainty [22], numerical then change the function of a base polynomial on penalized spline estimator with radial base function [23]. The radial base function is a function that depends on the distance between the data with a data center. In regression, the penalized spline is used radial basis function with K knot, for example, τ 1 , τ 2 ,..., τ K can be explain 1, ,..., Where m is a polynomial order and τ k is the kth knot with k = 1, 2, .., 3 which is the data center. Given n paired observations ( 1 ,y 1 ), ( 2 ,y 2 ),...,( n ,y n ) following the nonparametric regression model in (3): ISSN: 1693-6930  The step construction of penalized spline in electrical… (Rezzy Eko Caraka) 1025 with ε i is a random error assumed to be independent with mean 0 and variance σ 2 and f(x i ) is an unknown form of the regression function. The function f(x i ) using radial-based penalized splines in (4): if the equation is elaborated then obtained the following translation system: shown in (5) can be written in matrix form: [ ] The estimation result of regression function f(x) in (6) is: The radial-based penalized spline regression estimation is obtained by minimizing Penalized Least Square (PLS). Penalized Least Square (PLS) is a function of estimation criteria that combines least square functions with smoothness sizes. PLS functions can be describe in (8): where λ as smoothing parameter , R= [ ], shown in (8) can be describe: Parameter of ̂ from (7) obtained by minimizing PLS function. Sufficient condition for PLS function to reach minimum value is So obtained substitute (9) to (7) to obtain an estimate of f(x): ̂ shown on (10) can be expressed in form: with therefore the estimation of nonparametric regression function ̂ on (11) depends on the parameters of the λ.

Smoothing Parameter and Optimum Knots
The smoothing parameter as well as the balance controller between the curve conformity to actual data and training data. Pairing very small or large smoothing parameters will provide a very coarse or smooth form of completion function [6]. On the other hand, it is desirable to have an estimator form in addition to having a degree of the knot, also in accordance with the data [17]. Therefore, it is essential to choose an optimal finishing parameter. Selecting a smoothing parameter in principle is equivalent to selecting the optimal number of knots that produce the optimal knot value [24] which results in the minimum GCV value [25]. The GCV (Generalized Cross Validation) function can be expressed in equation (12): (12) with RSS = Residual Sum Square, ∑ ̂ . According to [7], the degrees of freedom are equivalent to the trace values of the hat matrix S(λ) = ( df = tr(S(λ)) = tr( so, the GCV function can be explained in (13) ( )

Results and Analysis
First, we must modify the time series data into two variables so that it can be used to do data estimation by regression approach through two variables. The two variables are independent variables (last day's data) and the dependent variable (the value to be predicted on this day). The characteristics of the smoothing regression are analyzed within the framework of identical and independent observation structures. Assume a pair of is independent. In reality the situation is inconsistent with the assumption that the observations (X 1 ,Y 1 ), (X 2 ,Y 2 ),...., (X n ,Y n ) are independent. Thus, if an object is observed from time to time, it is very likely that the dependent object will be affected by the dependent of the previous object. These effects can be modeled in three forms . In this paper, we provide the step construction of electrical load data by using p-spline radial basis as follows: 1. Modify the form of time series data into two variables dependent variable and independent variable. Z t or current data is the dependent variable (Y i ) whereas Z t-1 or the previous day's data is an independent variable (X i ). 2. Define a new independent variable X containing the unique value of the independent variable X that has been sorted from the smallest to the largest value. 3. Determining the order, many knots and the value of the fining parameters. 4. Determine the knot points using a quantitative sample of the new independent variable X.
Where the quantitative sample is based on the many knots (K) used. 5. Calculate parameter of ̂ according to the equation: when 6. Estimate the model according to the (18): 7. Calculate GCV as in (12) and (13) and select the optimal smoothing parameters from each of the many knot points based on minimum GCV values. 8. Selecting many optimal knots from each order using full-search algorithm method. 9. Estimate the model of each order based on many optimal knot points and optimal finishing parameters. 10. Choosing an optimal model based on minimum GCV values. 11. Calculate the value of R 2 to find out how far the match of a model and calculate the MAPE value see [26] more details, to determine the percentage of error between the actual data and predicted data. 12. Compare the predicted results with the actual data available This study uses secondary data obtained from BP3Sumatera. The data used is the total daily electricity usage data in Sumatra starting from the period of 1 August 2015 to 31 December 2015. The data is divided into two that are in sample data which is used to construct the model (starting from August 1st, 2015 to November 30th, 2015) and out sample data which are used to determine the accuracy of the model (starting from December 1st, 2015 to December 31st, 2015). The variables used in this study were the total daily usage of electricity in Sumatra where the predictor variable (Xi) is the total data of electricity usage at time 1.2, ..., i-1 k while the response variable (Yi) is the lag 1 of the total data Use of electricity i.e. data to 2,3,…,i. The average daily usage of electricity in Sumatra is 109162W with standard deviation of 2758W. The lowest utilization of electricity in Sumatra in August to November 2015 occurs on Sunday, August 2nd, 2015 which is 101313W. This is thought to be due to an earthquake of 4.7 SR in the Mentawai, Padang to Painan, which caused the breaking of electricity for some time in the Mentawai and surrounding areas. While the largest use of electricity occurred on Tuesday, 27 October 2015.

Modelling of Penalized Spline Regression with Degree 1
By using knots as much as 20, it is obtained that the optimum parameter of λ is 681 with minimum GCV of 4635405. Meanwhile, the optimal knots are 18 knots, so the formed model is: . .

Modelling of Penalized Spline Regression with Degree 2
By using knots as much as 20 it is obtained that the parameter of λ optimizer is 661 with minimum GCV of 4474538. Meanwhile, the optimal knots are 8 knots, so the formed model is:

Modeling of Penalized Spline Regression with Degree 3
By using knots as much as 20 it is obtained that the parameters of the optimized λ membalus of 7290 with a minimum GCV of 13150140. Meanwhile, the optimal knots amounted to 3 knots. So that the model is formed:

Selection of the Best Penalized Spline Regression Model
Selection of the best-penalized spline regression model is done by selecting the smallest Optimal GCV value. Based on Table 1, the optimum minimum GCV value is at 1st Based on the data out sample obtained MAPE value of 0.0131523. The value <10% means that the model's performance is very accurate [27]. Daily total daily electricity usage prediction for the period of 10-31 December 2015 is made by using penalized spline regression model with degree 1. Since the total daily usage value in Sumatera is already known, it can be done comparison between the actual value and the prediction value according to Table 1 and

Conclusion
Nonparametric approaches to estimation of regression curves have flexibility. The flexibility of the method is advantageous in research to find a particular relation of a case that is not or has not been previously available information on the shape of the regression curve. In fact, the regression curves obtained with nonparametric techniques may provide suggestions in constructing appropriate parametric models for future studies The optimum parameter of λ is obtained with minimum Generalized Cross-Validation (GCV) criterion, while optimal knot selection is done by full-search algorithm method. Generally, the knots of a P-spline are at xed quantiles of the independent variable and the only tuning parameter to choose is the number of knots. In this article, the effects of the number of knots on the performance of P-splines are The myoptic algorithm stops when no improvement in the generalized cross validation statistic (GCV) is noticed with the last increase in the number of knots. The full search examines all candidates in a x-ed sequence of possible numbers of knots and chooses the candidate that minimizes GCV. The myoptic algorithm works well in many cases but can stop prematurely. The full search algorithm worked well in all examples examined. Based on this research we have successfully done the simulation of soft computing using penalized spline and got a conclusion that this model can be used for electrical data which is known acutely fluctuate