Application of big data for distribution and consumption of power

ABSTRACT


INTRODUCTION
About 2.5 quintillion bytes of data are produced on daily basis.Almost 90 percent of data in the world today have been created in the last two years alone.The produced data comes from various outlets: climate information, social media, from digital pictures and videos posted on the internet, cell phone GPS signal, and records from the purchased transactions [1].Such colossal amount of data that is being produced continuously is what can be coined as big data.With the growth of technologies and services comes an increase in the amount of data.These types of data can be structured and unstructured from different sources that could contain billions of records of people information that can be gotten from social media, web sales, audios, images [2].There has been numerous demand in the storage and processing of data in this twenties century.In order to support the process of this larger amount of requested data, cloud computing was developed and the implementation was used successfully in data storage and processing [3].
Big data is a concept which could be used in any industry and a vast amount of data could be used to ones personal profit, but the focus here is the Nigerian Electricity Distribution Companies (Discos), which provides supply value chain services in electricity distribution.Discos are the known as the middle man between the customers and the electricity grid.Navigant Research reported that the smart meters installed worldwide will surpass 1.1 billion estimated by the year 2022 [4].
The analytics of big data is the process of extracting and analyze useful information in order to uncover the hidden patterns of data through the use of advanced techniques mostly data mining and statistical method to fetch hidden patterns [5].Big data has grown too large and this makes it difficult to work with using traditional database management system [6], [7].Business analytics is one of the new technologies that is characterized with low risks and quick paybacks which can help the organizations to improves and understand their business and leverage opportunities presented by abundant data domain specific analytics [8].When big data are empty, they lacking value or they are being useless.The prospective quality of big data is to act effectively for decision making, in order to activate this decision making process, the organizations will need an efficient method to turn high volumes of diverse data into meaningful perceptivity.There was widespread awareness of the value of enhancing building energy efficiency to save energy and improve building sustainability.One successful way to achieve this aim is to discover and derive useful information from the construction of operational data.
As the technologies keep increasing, smart grids distribution also increases which serve as the moving forces for the approval of big data analytics.These new technologies undergo development which made better quality to capture electricity used by the customers at any time of the day from the meter [9].There are still campaigns in many countries about the adoption of smart grids technologies.The European Union member states have fully initiated the installation of the intelligent meters, as a way to improve the efficiency of the energy system and their target is to reach 80 percent roll out by the year 2020.The development of the smart grid and meter infrastructures intelligent system gave birth to new level of data for electric power information technology and business leaders.This makes big data analytics extend to many areas of technology and business intelligence [8].
Data Analysing using smart grid makes it possible to identify clusters that have excessive electrical load, clusters that have high power outage frequencies lines with high failure probability.As a result, it is possible for example to identify grid upgrades, transformations and maintenance and to effectively forecast energy management [10], [11].An ideal power grid continually balances power generation and consumption for grid stability.The traditional power grid is based on one-directional approach from concentrated electrical power generation to the grid distribution by the transmission line, which does not allow the reverse flow.Currently, the boom of renewable energy, local energy production (from the original consumer called a "pro-consumer"), electrical mobility with rechargeable battery systems, energy storage and many other applications have forced the decentralization of the power system.In detail, the power grid system is evolving from a concentrated power plant to a micro one, where each local consumer may become a producer by, for example, photovoltaic panels, installed at home, storage, and fuel cell application.In this new scenario, the electronic meter is the gateway that connects the consumer/pro-consumer to the power grid.It is a key device for big data: data are frequently and rapidly acquired and subsequently analyzed by the power utilities [11], [12].
Since 2000, the Enel Company has replaced 38 million analogic meters with electronic meters, becoming the first power utility in the world ready for the smart grid application.The increasing number of distributed power stations, such as wind farms, concentrated heat power units, mini hydro plants and photovoltaic systems creates a virtual power plant (VPP).This new system can replace a conventional power plant, while providing more flexibility and higher efficiency.However, VPP is a complex system that requires difficult control optimization and secure communication.Big data analysis may help resolve problems and increase the reliability of the system.
Currently, the wind and solar energy resources can be connected to the power grids, since there is closely relationship between the capacities of power generation of new energy resources and the feature randomness intermittency of climate conditions.The intermittent renewable new energy sources can be efficiently managed if only the big data of power grids in effectively analysed.This will help the new energy resources generated to be allocated to the region with shortage of electricity [13]- [15].
Several authors have worked on both daily and seasonal trends in energy consumption, but due to the geographical region, most of their studies focused on consumption patterns within a specific country.These authors [16], worked on electricity demand in India which based on aggregate macro data at both national and state level.Their work uses econometric analysis to determine the income in consumption of electricity and the relationship between the consumption and gross domestic product per capital and price of electricity over a given period of time.
Yigzaw and Yohanis [17] presented on residential houses billing system in Hong Kong and the United Kingdom respectively.Consumption of energy patterns were considered as basic units of analysis.In [18] proposed MapReduces and apache Hadoop big data techniques was used by this author to analyze and generate insights for data of energy collected in order to improve the energy efficiency.He also used the energy data collected to evaluate different prediction models and forecast future consumption on the basis of previous energy consumptions.Whaley and Saman [19] uses smart meters as means to gather various household daily energy consumptions.The household energy consumption was done through segmentation of electric appliances and new technologies for heat generation.This makes it easy to identify consumers with similar needs and behaviors.The major key role in the evolution of smart grids is the application of big data analysis in order to assist in extraction as well as to analyze only the efficient energy consumption from the smart grids so there will be efficient management and discovering of hidden energy consumptions [20]- [23].

RESEARCH METHOD
The material used in this research is the data of about 196,000 homes and businesses that are customers of Ikeja Electric, one prepaid meter per entity.The smart meter generated information every day for a month, and there is one year of data available.All data is available in Excel and was later converted to CSV format.This primary data was obtained via the monitoring department of IKEDC's database and also secondary data via the information gathered from recent whitepapers, research materials, and prepared texts about big data analytics on the cloud and Nigerian electric power distribution.Hive as a data warehousing tool and Hadoop file system (HDFS) can ingest various structured, unstructured and semi-structured datasets, but for the purpose of this study and due to its peculiarities viz-a-via data collection device (smart meter).The apache Hive information warehouse software encourages questioning and overseeing substantial datasets link in the case of distributed computing Hive is a valuable asset for ELT, Hadoop knowledge inventory control, and Hadoop repository.Conversely, as opposed to conventional repositories, it is comparatively sluggish.It does not have any of the structured query language (SQL) functionality or even any of the database functions that standard repositories do.However, it embraces SQL, functions as a repository, and provides more users with access to Hadoop technologies (even those who are not programmers).It provides a method for converting unorganized and semi-organized data into functional template data.If you want to create a master data processing system?Hive allows you to do this.Do you want to develop a data storage facility?You can do the same thing with Hive, but you'll need to learn the techniques to make Hive an effective weapon for ELT tool [24][25][26][27].As shown in Figure 1, the transformation process basically involves putting the datasets in the most appropriate format suitable for analysis.This entails the creation of a database, temporary table, with which the data is transferred from HDFS to Hive metastore, Thereafter, creating an appropriate schema to represent the needed fields and records most suitable for querying.

RESULTS AND ANALYSIS
The Hive tool used in carrying out this research is open-source apache-Hive licensed data warehousing frameworks which can either be configured or pre-configured.A pre-configured machine was used in this study, owned by Hortonworks.It's the vendor's Hadoop distribution.A Hortonworks Data Platform hosted on Microsoft azure platform was used for this analysis due to its low system requirements, cost, and ease of usage.The language used in this analysis is the HiveQL which is a querying language used in the Hive data warehousing environment, it's similar to the SQL and MySQL languages.Figure 2 displays a table for running queries on data.

Figure 2. Creating a table for running queries on data
This is the view of the pre-process panel after the dataset has been imported.Once the data is loaded, Hive recognizes attributes that are shown in the 'attributes box' at the left corner of pre-process panel which shows the list of recognized attributes.The 'Table Name' window above 'attribute box' displays the fields name and type.By clicking on the rightmost icon on the table name, the data present on the table are displayed in the result section.Also, it displays the minimum, maximum, mean and standard deviation of the selected attribute.The most significant power distribution equipment is the distribution transformers and feeders.This analysis seeks to explore the busyness or otherwise of this equipment for effective decision  The analysis depicted in the graphs below, clearly shows the transformers that have load more and less than the average (84KW/H) for a month, this clearly shows the distribution transformer that needs load reduction in the different localities.The Figure 4 shows the average monthly load of exceeding transformer while Figure 5 shows the below transformer average monthly load.The Figure 6 shows the feeders or electric lines that carry optimum loads, overloaded and perhaps needs prompt replacement or more lines to be added to the locality.-Effects of climatic conditions on consumption Figure 7 shows rainfall statistics for the year 2016 in Lagos, it is used to highlight the months with the most and least rainfall.Thereafter a query was run for the average rainfall for those months and the results shown in Figure 8. Figure 9 shows the distribution of months and the corresponding sun intensity, while Figure 10 shows the average power consumption in these months.Figure 11 displayed the measure of the level of non-consumption of power up to the average quota and also the computed standard deviation of the consumption for the year.a.During the course of analysis, it was discovered that the total power consumed for the location under consideration is 7.6 GW (i.e.7.6 billion watts), by about 196,000 consumers, this insight could be useful for load request and estimation from the transmission companies.The average consumption for a household in a month is 84KW/H, this value can be used to support cost analysis and projected revenue figures for a year.b.The equipment analysis provides knowledge in form of load analysis for the distribution equipment; it clearly shows there're lots of electric lines with more than optimum electric traffic or throughput.Also, the graphs show the transformers that are typically overloaded (about 30% of them) clearly need their loads reduced.c. Analysis resulting from the effects of climatic conditions on power availability opposes the norm that there's more power available or consumed during periods of high rainfall.The results show that there's twice as much power consumed in January (period of low rainfall), compared to June (a typical month with high rainfall) sunshine.d.The standard deviation value (0.64) shows that 60% of households did not consume up to the average 84kw/h which shows the level of non-availability of power across households in the 11 business divisions.

CONCLUSION
There are several other big data analytics methods of deriving insights from large data pools; Hive clearly provides an easy and convenient way of doing this.This study shows successful deployment of big data technologies in the cloud, which depicts the interoperation of two disruptive technologies (cloud meets big data).The Hive query language provide a convenient way of querying both structured and unstructured data, while offering programmers or analyst with prior SQL knowledge an easy way to process MapReduce jobs.Finally, this study has shown that there are enormous useful insights or business intelligence that can be gotten from inundated large datasets using Hadoop tools and most conveniently hosted in the cloud.This is so because of the open source nature of this technology and hence it is inexpensive to deploy.The only bottleneck is the expertise required to carry out such important task.A further research can be done in the area of AMI data analytics to get power consumption insights on an hourly basis, how big data analytics can be used to curb energy/electricity data theft, analytics for energy and Utilities management.

Figure 1 .
Figure 1.Phases involved in generating results of analysis


ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 19, No. 4, August 2021: 1090 -1099 1094 making.Figure 3 shows the view in which the datasets are been loaded to their permanent location by matching column values of the datasets loaded into the temporary table.

Figure 3 .
Figure 3. Putting extracted data to the table

Figure 4 .
Figure 4. Transformers exceeding the average monthly load

Figure 5 .Figure 6 .
Figure 5. Transformers below the average monthly load

Table 1 .
Data fields description