Cognitive artificial-intelligence for doernenburg dissolved gas analysis interpretation

This paper proposes Cognitive Artificial Intelligence (CAI) method for Dissolved Gas Analysis (DGA) interpretation adopting Doernenburg Ratio method. CAI works based on Knowledge Growing System (KGS) principle and is capable of growing its own knowledge. Data are collected from sensors, but they are not the information itself, and thus, data needs to be processed to extract information. Multiple information are then fused in order to obtain new information with Degree of Certainty (DoC). The new information is used to identify faults occurred at a single observation. The proposed method is tested using the previously published dataset and compared with Fuzzy Inference System (FIS) and Artificial Neural Network (ANN). Experiment shows CAI implementation on Doernenburg Ratio performs 115 out of 117 accurate identification, followed by Fuzzy Inference System 94.02% and ANN 78.6%. CAI works well even with small amount of data and does not require trainings.


Introduction
Dissolved Gas Analysis (DGA) interpretation is the most reliable method in transformer fault diagnosis, as it is capable of detecting incipient faults before they becomes catastrophic [1][2][3]. Transformer oil degrades over time during operation [4][5][6]. At the presence of stresses, radices will be released from the oil as shown in Table 1 [7]. The radices will then form combustible gases as shown in Table 2 [5], [8]. Arcing CH * 720 Thermal-Low C * > 960 Thermal-High C * Table 2. Gases Formed by Radices [5], [8] Radices The more faults occurs, the more combustible gases are produced in the oil. These gases accelerate the degradation process. There are two kinds of transformer paper degradation processes, Hydrolysis and Pyrolysis [8], [9]. Pyrolysis is related to heat, while Hydrolysis is related to water. Pyrolysis produces Oxygen, which will then oxidize oil and paper insulator. Hydrolysis causes depolymerization of the oil insulator and later on produces Carbon Monoxide and Carbon Dioxide, which are acidic oxides that accelerate Hydrolysis even more.
There are several conventional methods commonly used in DGA interpretation, one of which is Doernenburg Ratio Method (DRM). Conventional methods have limitation, they have low accuracy [10][11][12], moreover, they have high failure rate [1]. There have been several researches working on DGA interpretation. They started from labeled dataset to test their proposed methods [10]. All of them used Artificial Intelligence (AI) to imitate human experts, 269 some used FIS [13], [14], some used ANN [1], [12], while some others used modification of standard-AI method [11]. Ratio DGA interpreting methods have been developed to identify faults [1], [10][11][12][13][14] as seen in Table 3.  [14] Normal AI methods have some disadvantages. ANN requires a large number of samples in training [15], while FIS requires complex computing resources and the lacks of design techniques [16]. Adaptive Neural Fuzzy Inferece System (ANFIS) improves the accuracy, but increase complexity of the system.
The most recent development of AI is Cognitive Artificial Intelligence (CAI) [17]. CAI uses a method knowns as Knowledge Growing System (KGS) that is able to solve multi-input and multi-output problems occured in the environment. This method is developed from the combination of Bayesian Inference Method (BIM), Maximum a Posteriori (MAP) theorem, and Linear Opinion Pool (LOP) to obtain decision options [17][18][19]. In the area of DGA interpretation, DGA problem requires multi-input processing in order to make decision in a multi-output situation, and hence, this method is suitable for this situation.

Proposed Method
CAI method solves problems based on Knowledge Growing System (KGS) principle and can be described using Figure 1.  Figure 1. Diagram of KGS [17] The diagram consists of two parts, the Information Part and the Knowledge Part. In the Information Part, information from multi sources are fused to extract new information with Degree of Certainty (DoC). If the DoC satisfies certain value, the information will then be sent to the knowledge part as the Current Knowledge for further processes. In the Knowledge Part, the Current Knowledge will be fused with the Existing Knowledge in order to obtain the New Knowledge with DoC. If the DoC satisfies certain value, the knowledge will then become the Ultimate Knowledge.
The main component of CAI is Information Fusion [17]. Information Fusion processes information by imitating the way human brain processes information. Multi-source information were processed in order to obtain new information.
ASSA2010 is the new name given by the method's inventors, to the original method's name namely A3S (Arwin-Adang-Aciek-Sembiring) [17]. It is the algorithm used to fuse information that that is derived from BIM, MAP, and LOP (BIM + MAP + LOP) [17][18][19] as shown in (1). The probability of Hypothesis B given A is the probability of Indication A given B times the probability of B per all the possible events.
where P(B j |A i ) is the probability of Hypothesis B j given A i , P(A i |B j ) is the probability of Indication A i given B j . P(B j ) is the probability of Hypothesis B j itself, and Σ(P(A i |B k ) P(B k ) is the combination of all possible events. They are put into arrangement in the Indication-Hypothesis matrix as fused information (P(υ i j )). BIM + MAP is unable to do decision making in multi-input multi-ouput situation as faced in DGA situation. ASSA2010 improves BIM + MAP by combining the concept of LOP so it has capability in making decision in multi-input multi-ouput situation based on the New Knowledge Probability Distribution (NKPD) it produces after performing computation on Indication-Hypothesis data.
NKPD comprises of DoC for each hypothesis, which indicates how a Hypothesis can be believed regarding to the presence of the combination of Indications. NKPD can be calculated using (2): where δ is the number of sensors and P(ψ i j ) is the inferenced fused-information. The system keeps receiving information from sensors in the form of P(ψ i j ) according to (3).
where λ is the number of fused information. The New Information is fused with the previous information to produce NKPD over Time (NKPDT), which is as shown in (4).
where Γ is the number of observation and P(ϕ γ j ) is the NKPD of each observation. Decision will be made based on the highest DoC in the NKPDT matrix as shown in (5): where j = 1, 2, 3, …, λ. The proposed method can be described in pseudocode as follows:  Localization of Ratios is shown in Table 4. The input of Table 4 is ratio R i , and the output is score A i , if R 1 exceeds 1, A 1 will be given score '2', if R 1 is between 0.1 and 1, A 1 will be given score '1', and if R 1 is below 0.1, A 1 will be given score '0'. If R 2 is below 0.75, A 2 will be given score '0', and if R 2 exceeds 0.75, R 2 will be given score '1'. If R 3 exceeds o.3, A 3 will be given '1', otherwise A 3 will be given '0'. If R 4 exceeds 0.4, A 3 will be given '1', otherwise A 4 will be given '0'.Observation matrix is shown in Table 5, while NKPD is calculated using (2) and is shown in Table 6. Since the test data is not time-series, NKPDT does not need to be calculated, but for time-series data, NKPDT can be calculated using (4)

Results and Analysis
The proposed method is verified using previously-published data consisting 117 samples with 9 Partial Discharge (PD), 26 Low-Energy Discharge (LE), 48 High-Energy Discharge (A), 16 Thermal-Low (TL), and 18 Thermal-High (TH) that is put into groups based on the fault types. For example, data #1 of PD, R 1 =0.07. R 2 =0.00, R 3 =0, and R 4 =inf. Using Table 3, Ratios are then localized and given score, A 1 =0, A 2 =0, A 3 =0, A 4 =1. These scores are arranged in observation matrix in Table 5. Table 3 is also used to give scores to each Hypothesis, for example R1='0', column H_TD, H_PD, and H_A for row R1 is given '0', '1', and '0' respectively to be arranged in Table 7. Other types of faults can be analyzed using the same method. NKPD is calculated using (2) and is shown in Table 8.  Sample number 1, 2, 7, and 8 in dataset PD indicates the major faults were PD, some with the minor fault TD, while others with A. meanwhile, in sample number 3, 4, 5, 6, and 9, no major faults occurred, however PD are still identified as the DoCs are among the highest. All samples in the dataset LE indicate the major faults are Arcing, while minor faults are mostly PD, this is due to the limitation in Doernenburg method, which identifies only verylow Energy Discharge (PD) and High-Energy Discharge (A). All samples in the dataset HE indicate the major faults are Arcing, while some others are identified as TD and also some others as PD. Dataset TL and TH give TD, with two samples in the dataset TL are identified as PD.
The dataset are also tested using Fuzzy Inference System (FIS) and Artificial Neural Network (ANN). FIS and ANN are designed to be implemented on the same conventional DRM using the same dataset. FIS has four input membership functions (R 1 , R 2 , R 3 , and R 4 ) and three output membership functions (TD, PD, and A). The result is shown in Table 9. For dataset PD, FIS has 88.9% accuracy, while ANN has 55.6%. For dataset LE, FIS has 92.3% accuracy, while ANN has 53.9%. For dataset HE, FIS has 100% accuracy, while ANN has 95.8%. For dataset TL, FIS has 75% accuracy, while ANN has 81.3%. For dataset TH, FIS has 100% accuracy, while ANN has 77.8%. The accuracy of FIS in the testing is confirmed by various authors, which lies between 93% to 96% with various datasets and conditions [13], [14], while the accuracy of ANN varies due to the small numbers of samples [1], [11,12].
The proposed method performs 87.5% accuracy for dataset TL and 100% accuracy for other dataset. The overall performance of CAI is 98.3% accuracy, with 115 correct identifications out of 117 samples. The accuracy of ANN is proportional to the number of samples.

Conclusion and Future Works
In this research, a novel method of DGA interpretation is proposed. The proposed method is based on CAI with KGS as the core. In this paper CAI adopts Doernenburg Ratio method. There are some important things that can be noted from experiments.

Conclusion
The proposed method has successfully identify fault based on DGA with the overall accuracy of 98.3% from five types of fault with 87.5% accuracy on Thermal-Low, and 100% accuracy on Partial Discharge, Low-Energy Discharge, High-Energy Discharge, and Thermal High, which results in higher accuracy than FIS (94.02%) and ANN (78.6%). This CAI-based proposed method does not require a large amount of data and training as ANN does and does not require a complex computation as FIS does.
The more the sample number, ANN tends to have more accuracy. The accuracy of FIS is the proportional to the membership functions and rule base. The more detail the membership functions, the more accuracy FIS will have, resulting more complexity.
The accuracy of the proposed method is independent to the number of samples. It is shown in Table 8, where various number of samples resulting 100% accuracy. The complexity of the proposed method is only proportional on the number of hypotheses and indications. Another advantage of the proposed method is CAI works well even with small amount of data and does not require trainings. Meaning that the proposed method is able to capture the ability of the human brain in performing fast learning to obtain decision options for multi-input multioutput problems.

Future Works
The proposed method is a novel method in computing, and AI area that is developed to solve multi-input and multi-output problems, such as in Biomedic [20], DPA countermeasure [21], Intrusion Detection System [22], emotion modeling [23], and many other area requiring a Multi-input-Multi-Output processing such as in big data [24]. The proposed method can be implemented in software [22] and hardware [21], [25]. For the time being, a cognitive processor is being developed [26] and is going to be used in an embedded system to perform such tasks with more power efficiency and higher speed.