Asthma Identification Using Gas Sensors and Support Vector Machine

The exhaled breath analysis is a procedure of measuring several types of gases that aim to identify various diseases in the human body. The purpose of this study is to analyze the gases contained in the exhaled breath in order to recognize healthy and asthma subjects with varying severity. An electronic nose consisting of seven gas sensors equipped with the Support Vector Machine classification method is used to analyze the gases to determine the patient's condition. Non-linear binary classification is used to identify healthy and asthma subjects, whereas the multiclass classification is applied to recognize the subjects of asthma with different severity. The result of this study showed that the system provided a low accuracy to distinguish the subjects of asthma with varying severity. This system can only differentiate between partially controlled and uncontrolled asthma subjects with good accuracy. However, this system can provide high sensitivity, specificity, and accuracy to distinguish between healthy and asthma subjects. The use of five gas sensors in the electronic nose system has the best accuracy in the classification results of 89.5%. The gases of carbon monoxide, nitric oxide, volatile organic compounds, hydrogen, and carbon dioxide contained in the exhaled breath are the dominant indications as biomarkers of asthma.The performance of electronic nose was highly dependent on the ability of sensor array to analyze gas type in the sample. Therefore, in further study we will employ the sensors having higher sensitivity to detect lower concentration of the marker gases.


Introduction
The exhaled breath contains many gases, mostly oxygen (O 2 ), carbon dioxide (CO 2 ), water vapor, nitric oxide (NO) and various volatile organic compounds (VOCs) [1]. These gases can be measured accurately using Gas Chromatography (GC) [2]. However, this technique is expensive, because the testing process takes a long time and requires interpretation from an expert [3]. An electronic nose can be used as an alternative to analize the exhaled breath with low production costs, non-invasive, faster time in sample measurements, and portables [4,5]. The electronic nose has been used to diagnose kidney disease [6,7], diabetes [8], and lung cancer [9]. Biomarkers are physical symptoms of laboratory measurements that can serve as indicators of biological or pathophysiological processes or in response to therapeutic interventions [10].
Several studies have analyzed exhaled breath for asthma by using Cyranose 320 that contains 32 different polymer nanocomposite sensors. The analytical method used to analyze the sensor response is Principal Component Analysis (PCA). The results show that electronic nose can distinguish exhaled breath between healthy and asthma subjects, however, it is not good enough to distinguish the subjects of asthma with different severity [11]. The performance of the electronic nose depends on the features of the classification algorithm for exhaled breath. One of the classification methods that gets a lot of attention as state of the art in pattern classification is the Support Vector Machine (SVM) [12][13][14][15][16][17][18]. The SVM is an effective technique for quantitatively analyzing gas mixtures as this can solve the cross sensitivity problem of gas sensor array [19].
The purpose of this study is to analyze the gases contained in the exhaled breath in order to recognize healthy and asthma subjects with varying severity. The exhaled breath samples were taken from patients diagnosed with asthma and then analyze it in the laboratory. An electronic nose consisting of seven gas sensors equipped with the SVM classification method is used to analyze the gases contained in the exhaled breath sample to determine the patient's condition. The non-linear binary classification is used to identify healthy and asthma subjects, whereas the multiclass classification is used to identify the subjects of asthma with different severity.

Research Method 2.1. Subjects and research design
The subjects are 30 patients diagnosed with asthma, and 30 healthy subjects volunteered in the study. All subjects are adults, not smokers, aged between 30-60 years, and do not suffer from acute or chronic illness. In current clinical practice, asthma is diagnosed and monitored from symptoms and physiological measurements using the Globe Initiative for Asthma (GINA) and Asthma Control Test (ACT). One of the standard characteristics of GINA is lung function examination by measuring Forced Expiratory Volume in the first second (FEV 1 ) or Peak Expiratory Flow (PEF) performed by forced expiratory maneuvers through standard procedures. From this measurement, the degree of asthma is divided into three classes, namely controlled, partly controlled, and uncontrolled asthma [20,21].
The classification of asthma subjects is based on standards with diagnostic results using ACT shown in Table 1. Each subject is asked to exhale the air collected in 1L Tedlar bag after breathing in and out for 5 minutes with clean room air. The exhaled breath is carried out at the asthma clinic at Dr. Soetomo Surabaya after getting approval from Hospital Research and Development. Figure 1 shows the research design performed for the classification of asthma.  Figure 1. The research design for asthma identification using gas sensor

Electronic nose system
The main part of the electronic nose system is a 240 ml chamber consisting of a gas sensor array. Each sensor has sensitivity and selectivity to certain gases flowing through the chamber. The sensor resistance will change from fresh air to gas sample, . The sensor output voltage is determined by equation (1).
(1) is the load resistance, is sensing resistance, is the constant voltage applied to the sensor and is the transient output voltage. Each sensor response is an analog signal, which is then filtered, amplified and converted to digital form and then sends it to the computer every second. Pumps are activated to flow fresh air from silica gel or gas from the exhaled breath bag into the sensor chamber.The flow rate is maintained at 100 ml/min. Before carrying the gas sample, the sensor chamber is dried with fresh air to clean the sensor from residual gas. The humidity rate in the fresh air is reduced by flowing the air through a tube containing silica gel material.
The sampling process of the exhaled breath is carried out in the certain time order. In the period of 1 st to15 th seconds, the sensor response is in a based line time, t b . During this time, the valve is OFF so that fresh air can flow from B to C. In the period of 16 th to 55 th seconds, the exhaled breath is injected to the sensor chamber. This process is called reaction stage time, t r . At this moment, the valve is ON so that the gas in tedlar bag is sucked from A to C. In the period of 56 th to150 th seconds, the injection is stopped and sensor array is cleaned, and this process is called as purge stage time, t p . At this stage, the valve is OFF again. After the 150 th seconds, the valve remains OFF until further sampling took place. The block diagram of the electronic nose system is shown in Figure 2, while the photo setup of the apparatus and its computer interface is shown in Figure 3. The sensor response during sampling process is shown in Figure 4. The portion of the sensor response at the reaction stage over a period of 30 th to 49 th seconds is a feature extraction which can be considered to represent the overall sensor response for the classification process.   The sensor array used on the electronic nose consists of seven gas sensors which its selectivity and sensitivity are shown in Table 2. Each sensor is intended to detect gases in exhaled breath that indicate the presence of asthma in the subject. In some studies have showed that asthma patients had higher concentration values for NO [22], and carbon monoxide (CO) [23]. There is also a relationship between the VOC in exhaled breath subject with lung disease [24]. There are several factors associated with exhaled breath by asthma patients including an increase in hydrogen (H 2 ) due to digestive system, CO 2 affected by exposure to air pollution, ammonia (NH 3 ) due to acid-base status in the airway of asthmatics [25], and hydrogen sulfide (H 2 S) as harmful hydrocarbon elements in the human body [26].

Data Analysis
In the sampling process, the sensor array measures the level of gases in the exhaled breath and then sends it to the computer for analysis. The data package of the sampling process on a subject is a data set as matrix [150x7] consisting of a response of seven sensors for 150 seconds. Only the signal response at the reaction stage is used for analysis. Data analysis consists of three stages, namely pre-processing, feature extraction and classification. In the pre-processing includes baseline correction and normalization. The baseline correction is conducted by subtracting each sensor signal response with the average signal response at the baseline, , . The result is the pre-processed signal response defined at equation (2).
where i = 1,2, ......,N i (N i is the number of sample subjects), s = 1,2,....,N s (N s is the number of sensors), t b = 1,2, ...,N b (N b is the maximum time period on the base line stage). Normalization of the data is desired to reduce a pattern variation due to variations in the vapor concentration [27]. The normalized data is expressed by equation (3).
Output signal response for each sensor at the reaction stage after being processed by equation (2) and (3), , is a feature extraction signal for use on classification process. In this process, a dataset [20x7] is derived from the , on the signal response at the sampling periode of 30 th -49 th seconds.
SVM is a machine learning method that works on the principle of Structural Risk Minimization (SRM) with the aim of finding the best hyperplane that separates the two classes in the input space. The method proposed in 1992 by Vladimir N. Vapnik, Bernhard E. Boser and Isabelle M. Guyon has been widely applied especially in the field of bioinformatics [28]. There are two types of classification conducted by the SVM, namely non-linear binary classification and non-linear multi-class classification. For multi-class classification employs one-against-one approach and Gaussian radial basis function kernel or RBF kernel.
The SVM concept can be explained simply as a search for the best hyperplane that serves as a separator of two classes in the input space. Figure 5 shows some patterns of members of two classes: +1 and -1. The pattern incorporated in class -1 is symbolized by green (circle), while the pattern in class +1 is symbolized by red (box). The classification problem can be translated by finding the line or hyperplane that separates the two groups. Various alternative discrimination boundaries are shown in Figure 5(a). The best separator hyperplane between the two classes can be found by measuring the hyperplane's margins and searching for the maximum point.
Margin is the distance between the hyperplane and the closest pattern of each class. The closest pattern is called a support vector. The solid line in Figure 5(b) shows the best hyperplane, which is located at the center of the two classes, while the green and red dots that are in the black circle are the support vectors. The effort to locate the hyperplane is at the heart of the learning process in SVM.
The available data is denoted as ̅ , whereas each label is denoted ∈ 1, 1 for i = 1,2, ...., l. It is assumed that both classes -1 and +1 can be perfectly separated by d dimensioned hyperplane, defined as: . ̅ 0. The ̅ pattern that belongs to class -1 (negative sample) can be formulated as a pattern that satisfies inequality . ̅ 1. While the ̅ pattern that belongs to the class + 1 (positive sample) can be formulated as a pattern that satisfies inequality . ̅ 1. The greatest margin can be found by maximizing the distance value between the hyperplane and its nearest point, ‖ ‖ . This can be formulated as a Quadratic Programming (QP) problem, which is to find the minimal point of equation shown in equation (4), and (5).
. ̅ 1 0, ∀ This problem can be solved by various computational techniques, including Lagrange Multiplier.
, , is a Lagrange multiplier that is either 0 or positive 0. The optimal value of equation (6) can be calculated by minimizing L to and b and maximizing L to . Considering the nature at the optimum point of slope L = 0, equation (6) can be modified as a maximization problem containing only, as indicated by (7). Maximize: Subject to  From the results of this calculation, it will be obtained which mostly positive value. Data correlated with positive is called a support vector. In general, problems in real world domains are rarely linearly separate but mostly non-linear. To solve the non-linear problems, the SVM is modified by entering the Kernel function. In non-linear SVM, the data x is mapped by the function Φ ̅ to a higher-dimensional vector space. In this new vector space, the hyperplane that separates the two classes can be constructed. To classify non-linear data, the SVM formula must be modified. Therefore, the two limiting fields of (5) have to be changed so that they are more flexible for certain conditions with the addition of the variable ( 0 shown in the equation (8).
Thus the equation (4) is changed to: C is chosen to control the tradeoff between margin and misclassification. A large C value means it will give a larger penalty for the misclassification. Another method of solving the non-linear data problems in SVM is by mapping data to a higher dimension space (feature space) [29], with data in that space can be separated linearly by using the transformation of Φ: → . Thus the training algorithm depends on the data through the dot product in H (e.g. Φ . Φ ). If there is a kernel K function, such as , Φ . Φ , thus in the training algorithm requires only the kernel K function without having to know the exact Φ transformation. By transforming → , then the value w becomes ∑ and the learning function becomes: The feature space usually has a higher dimension resulting a feature space that may be very large. To solve this problem, then it uses the "kernel trick" , .
, for which the equation (10) becomes: with is the support vector. There are some kernel functions that are commonly used in the SVM, one of which is used in this study is the RBF kernel. In particular, it is commonly used in 1474 support vector machine classification [30]. The RBF kernel on two samples and represented as feature vectors in some input space is defined in equation (12) , 2 ⁄ The classification reliability is indicated by the sensitivity and specificity resulted by either positive (diseased) or negative (healthy). In the classification, the number of disease subjects identified as the disease subjects are denoted by t p (true positive), the disease subjects identified as the healthy subjects are denoted by f p (false positive), the healthy subjects identified as the disease subject are denoted by f n (false negative) and the healthy subjects identified as the healthy subjects are denoted by t n (true negative). The sensitivity, specificity, and accuracy for two and three classes are defined as in Table 3 and 4, respectively.  Table 4. Definition of Accuracy for three classes Figure 6. The SVM Algorithm for classification

Results and Analysis
The electronic nose has been tested to detect and identify exhalled breath samples from subjects with healthy and asthma subjects and among asthma subjects with different 1475 severity using the procedure described in the method. Figure 7 shows the response of the seven sensors to the four exhaled breath samples during a periode of 150 seconds. Figure 8 (a) shows the average response for the two categories. It is possible to obtain a combination of sensors that clearly indicate each of the two categories. Sensor responses of CO, NO, H 2 S, and CO 2 are smaller for healthy subjects, but not for H 2 , NH 3 , and VOCs. Figure 8 (b) shows the sensor responses for the four categories. We have conducted five types of classifications.

Classification of Healthy and Asthma using seven sensors
In this study, we used the electronic nose to distinguish between exhaled breath samples of healthy and asthma subjects using non-linear binary SVM classification.The composition of the subject database is shown in Table 5. The exhaled breath samples of subjects were collected from patients who have been diagnosed with early diagnosis standard using ACT. We randomly collected 20 samples for the training sets from each category. The remaining 10 samples from each category were used for the test set. In this study, we aim to test the ability of the electronic nose to distinguish each category of healthy and asthma subjects using binary classification.  Table 6 shows the results of classification for two categories. For the 10 asthma subjects as the test set, the system could correctly identify 10.0 samples as asthma and incorretly 0.0 sample as healthy. For the 10 healthy subjects as the test set, the system could correctly identify 7.8 samples as healthy and incorretly 2.2 samples as asthma. In this identification results, the sensitivity, specificity, and accuracy are 100, 78.0, and 89%, respectively.

Classification of Healthy and Asthma with Different Severity
We have collected three categories of asthma subjects separated by severity, each consisting of ten samples. Ten samples for healthy subjects were taken randomly from 30 samples. The composition of the subject database is shown in Table 7. We randomly collected six samples for the training sets from each category. The remaining four samples from each category were used for the test set. Table 8 shows the result of classification for four categories. CA subject could be distinguished form healthy subject with 82% accuracy, 90% sensitivity and 75% specificity. PA subject could be differentiated form healthy subject with 87.5% accuracy, 75% sensitivity and 100% specificity. UA subject could be distinguished form healthy subject with 66% accuracy, 65% sensitivity and 67.5% specificity.

Classification of Three Categories of Asthma with Different Severity
In this study, we have tested the electronic nose to distinguish three different categories of asthma subjects with different severity of CA, PA and UA using the non-linear multi class SVM classification. Table 9 shows the results of non-linear multi class SVM classification for this study. For the four CA subjects as the test set, the system could correctly identify 0.9 as CA, incorrectly 1.1 samples as PA, incorrectly 2.0 samples as UA. For the four PA subjects as the test set, the system could correctly identify 2.2 samples as PA, incorrectly 1.8 samples as CA, and incorrectly 0.0 samples as UA. For the four UA subjects as the test set, the system could correctly identify 1.6 samples as UA, incorrectly 1.0 sample as CA, and incorrectly 1.4 samples as PA. In this identification, the accuracy is 39%. The result of identification shows that by only providing accuracy value, the system cannot distinguish among categories. Therefore, we have employed the binary classification for these three categories. Table 10 shows the non-linear binary SVM classification for three categories of asthma subjects. For the four CA samples as the test set, the system could correctly identify 0.4 samples as CA, and incorretly 3.6 samples as PA. This indicates that the sensitivity is bad (i.e. 10%). For the four PA samples as the test set, the system could correctly identify 3.3 samples as PA, and incorrectly 0.7 samples as CA.  This indicates that the specificity is good (i.e. 82.5%). Similarly, to distinguish between CA and UA subjects, the system's ability shows poor sensitivity (i.e. 40%) and quite good specificity (i.e. 75%). Finally, to differentiate between PA and UA subjects, the system's capability shows good accuracy (i.e. 78.75%).

Classification of Healthy and Asthma using five sensors
In this study, we used the response from five sensors, i.e. CO, NO, VOC, H 2 and CO 2 sensors, to differentiate between healthy and asthma subjects using non-linear binary SVM classification. Table 11 shows the classification results obtaining the values of the sensitivity, specificity, and accuracy are 100, 79, and 89.5%, respectively.

Classification of Healthy and Asthma using three sensors
In this study, we used the response from three sensors only, i.e. CO, NO and VOC sensors, to differentiate between healthy and asthma subjects. Table 12 shows the classification results obtaining the values of the sensitivity, specificity, and accuracy are 90, 79, and 84.5%, respectively. The ability of the electronic nose to identify between healthy and asthma subjects using various number of sensors is shown in Figure 9. It concludes that the use of five gas sensors in the electronic nose system has the best accuracy in the classification results. This suggests that the gases of CO, NO, VOC, H 2 , and CO 2 are the dominant indications as markers of asthma.

Conclusion
In this research, the exhaled breath identification system for healthy and asthma subjects with different severity was attained by means of electronic nose. The electronic nose system has performed well by identifying the gases contained in exhaled breath. The system produces signal response processed by three stages, namely pre-processing, feature extraction, and classification. The feature extraction and classification was performed by SVM method, which was an effective technique to analyze gas mixtures quantitatively because of its ability to solve the cross sensitivity problems from the gas sensor array. The result of this study showed that the system using non-linier multi class SVM provided quite low accuracy to differentiate among asthma subjects with several degrees of severity. By using non-linear binary SVM, this system can distinguish between the partially controlled and uncontrolled asthma subjects with the accuracy of 78.8%. However, the system provided high rate of sensitivity, specificity, and accuracy to recognize between healthy and asthma subjects. The use of five gas sensors in the electronic nose system has the best accuracy in the classification results of 89.5%. This study suggests that the gases of CO, NO, VOC, H 2 , and CO 2 contained in the exhaled breath can be used as biomarkers of asthma.