Spectral-based Features Ranking for Gamelan Instruments Identification using Filter Techniques

Pada paper ini, kami menjelaskan upaya dalam menentukan ranking fitur berbasis spektral dengan memanfaatkan teknik filter yang digunakan untuk identifikasi instrumen gamelan Jawa. Model yang dipakai mengekstraksi sekelompok fitur berbasis spektral dari sinyal suara gamelan dengan menggunakan Short Time Fourier Transform (STFT). Ranking dari fitur ditentukan dengan memanfaatkan lima algoritma, yaitu ReliefF, Chi-Squared, Information Gain, Gain Ratio, dan Symmetric Uncertainty. Selanjutnya kami menguji ranking fitur secara validasi silang dengan menggunakan Support Vector Machine (SVM). Eksperimen menunjukkan bahwa algoritma Gain Ratio memberikan hasil terbaik, yaitu menghasilkan akurasi sebesar 98.93%. Kata kunci: support vector machine, transkripsi otomatis


Introduction
Feature selection is a process of finding an optimal feature subset, removes irrelevant or redundant feature. Feature selection is one of the important steps in machine learning especially for recognition tasks. The performance of recognition algorithms are usually dependent on the quality of the feature set. If the feature set contains redundant or irrelevant features, the algorithm may produce a less accurate or a less recognition rate. The feature selection problem has been studied by the statistics and machine learning communities for many years [1][2][3][4]. The feature selection algorithms can be categorized as filter, wrapper, and embedded methods based on the criterion functions. Filter methods uses statistical properties for evaluating feature subsets. The advantages of filters methods are fast and efficient to process high dimensional datasets, however filters approach do not consider the feature dependencies. Wrapper methods use a learning algorithm for evaluating the selected feature subsets. Embedded methods are similar to wrapper methods, but less computationally expensive and considering feature dependencies [5]. Feature extraction can be viewed as finding a subset of raw data while reducing the dimensionality.
Many algorithms have been developed to perform audio feature extraction; common methods such as temporal based and spectral based using Fast Fourier Transform (FFT), Short Time Fourier Transform (STFT), Discrete Wavelet Transform (DWT), and Continuous Wavelet Transform (CWT). There are various features have been proposed for audio signal, such as zero crossing rate, RMS energy, envelope, and spectrum representation [6]. We used a set of spectral-based features which has been previously developed for gamelan instruments identification [7].

Research Method
A general view of the flowchart of the proposed system is depicted in Figure 1. The output of the proposed system is the selected feature subset for identifying the gamelan instruments. The first stage in our proposed system is preprocessing. Before a gamelan recording is subjected to the proposed methods, it is preprocessed in some way in order to make the following task easier.
The preprocessing consists of noise reduction, low-pass filtering, and sampling rate conversion. The second step is to create time-frequency representation or spectrogram from a gamelan recording. The 2D matrices spectrogram of the given gamelan recording is calculated Before extracting the features set, segmentation in the time-frequency domain was performed. The process of segmentation for the time-frequency representation requires note onset information. Note onset can be detected using sudden changes of acoustic energy approaches [24]. In the case of strong gamelan note, this abrupt energy changing will be very sharp. We can find the onset location using the peak detection function [25]. The features set then calculated based on the segmented spectrograms The features set should contain useful information for identifying and differentiating gamelan instruments. In this paper, we used 34 features for gamelan instruments identification tasks. The features [26] have been calculated and additional features have been extracted including the statistical properties like mean, variance of the spectral envelope.
We compared the five feature ranking algorithms of the filters approach. They are Information Gain, Gain Ratio, Chi-squared, Symmetrical Uncertainty and Relief. Ranking algorithms produce a ranked list, according to the evaluation of criterion function. For the sake of performance comparison, we also consider the cross validation accuracy. We calculated the cross validation accuracy in terms of SVM classifier.

Time-frequencies Analysis
The goal of automatic gamelan transcription is to extract the sequence of gamelan notes from gamelan recording. Gamelan notes are any system that represents the pitch of a gamelan sound. This paper is part of the project aims to develop a system that extracts note events from gamelan sounds spectrogram.
Spectrogram is a spectro-temporal representation of the sound. Spectrogram provides a time-frequency portrait of gamelan sounds. The STFT has been the commonly used method for generating time-frequency representations or spectrograms of musical signal. The result of STFT can be plotted on a 2D or 3D spectrogram (as shown in Figure 2) as a function of time and frequency, and magnitude is represented as the height of a 3D surface spectrogram or intensity in 2D spectrogram. However, STFT suffers from the common shortcoming that the length of the window determines the time and frequency resolution of the spectrograms [27] [21].
The size of the window used for STFT is related to the time resolution and frequency resolution. If we apply a short window, we will have good time resolution. However, if we implement a long window, we will get high frequency resolution but low time resolution. For pitch analysis such as automatic gamelan note transcription, the frequency resolution of the spectrograms is more important than the time resolution [21]. Then STFT with long window is good enough for automatic gamelan note transcription.  Figure 4). Th rogram.

S
ISSN: 1693 transcriptio r we used s find points w in Figure 3) es. Boundarie es in the amp gamelan bla ng. For notes sudden chan nts. The ga ffuse. So it is cations that n in Figure 3 didates or e ergy change e most salie re considered he onsets det In this paper, we provide 34 spectral features, such as: fundamental frequency, spectral centroid ( Sc ), two spectral rolloff ( Fc ), spectral flux ( SF ), spectral skewness ( Sa ), spectral kurtosis ( Sk ), spectral slope ( Ss ), and spectral bandwidth ( Sw ). These features are then combined as a feature set of a gamelan sound. The feature set is normalized by dividing each feature component by a real number so the result is between -1 and 1. The normalized feature set is considered as the final representation of the gamelan sound. indicates a flatter distribution [30]. Those features (spectral skewness, spectral moment, spectral kurtosis, and spectral entropy) were implemented using statistical function.
Spectral centroid ( Sc ) is a measure of the center of gravity of the spectrum. The spectral centroid is computed by multiplying the value of each frequency by its magnitude, then the sum of all these divided by the sum of all the magnitudes. The spectral centroid (Sc) [31] [29] [32] can be defined as Eq. (1), is the magnitude for the frequency f at bin i , N is the number of frequency bins.
Scheirer and Slaney defined the spectral rolloff point ( Fc ) as the 95th percentile of the power spectrum distribution [33]. Spectral rolloff is the frequency when 95% of the signal energy is contained. Spectral rolloff ( Fc ) is defined as Eq. (2),

Feature Ranking
The goals of feature selection are improving computational efficiency but preserving or even increasing recognition rate. It becomes important to the success of the tasks that apply machine learning approach especially when the data have many irrelevant or redundant features. In general, the features selection algorithms can be categorized as wrapper approach and filter approach [34] [1].
The five filter-based feature ranking techniques being compared are described below.
Those techniques are Information Gain ( IG ), Gain Ratio ( GR ), ReliefF ( RF ), Chi-Squared ( CS ) and Symmetric Uncertainty ( SU ), and available in the Weka data mining tool [44]:  [36]. The main idea of Relief algorithm is to evaluate the worth of a feature or attributes based on how well their values can be used to distinguish among the instances. Relief algorithm cannot handle incomplete data and only limited to two-class problems. The ReliefF is the extended version of Relief. ReliefF can handle incomplete data and not limited to two class problems. However, if we apply the algorithm for a highly noisy data that have many irrelevant features and/or mislabeling, the performance of ReliefF can get worse [37]. (ii) Chi-Squared (CS) can be used to evaluate the worth of a feature by calculating the value of the Chi-Squared with respect to the class. The null hypothesis is the assumption that the two features are unrelated, and it is tested by Chi-Squared formula from Plackett [38]. If we got a large value of CS, then we can determine that the feature is an important feature. (iii) Information gain (IG) can also be used for determining the feature rank. The main idea of IG is to select features based on entropy. Entropy is a measure of how mixed up or uncertainty or the disorder degree of a system. [39] [40]. IG measures the number of bits of information gained about the class prediction when using a given feature to support the prediction. Information gain [40] of the feature or attribute A is defined as Eq. (3),

(i) ReliefF (RF) is an extension of the Relief algorithm developed by Kira and Rendell
where ) (C E is the entropy of classes C and ) | ( A C E is the conditional entropy of C given A when the value of the attribute A is known. (iv) The Gain Ratio (GR) is an extended version of Information Gain. GR is more efficient and effective than Information Gain and can be used to evaluate the correlation of attributes with respect to the class concept of an incomplete data set in [41] [42] [35]. The gain ratio of A is defined as the information gain [40] of A divided by its intrinsic value ) ( A IV using Eq. (4), (v) Symmetric Uncertainty (SU) is a correlation measure between the features and the class, and it is obtained by [44] [1] Eq. (5), where E(A) and E(C) are the entropies based on the probability associated with feature A and class value C .

Cross Validation
As discussed in the previous section, we need to make a comparison of performance between different ranking approaches using cross validation method. Cross validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two portion of data for training and validating or testing the model. The goal is to compare the performance of different ranking approaches and find out the best approach for the gamelan instruments recognition. Cross validation can also be used to understand the generalization power of a classifier. 103 data is presented in Table 3. The first 7 features are consistently ranked as the top. The first 4 features predicted by the five techniques gives the same results, although features 5, 6 and 8 are reversed in some rankings. For each ranking method, investigation of recognition accuracy on the testing data as a function of the features has been done in ascending order and descending order. Recognition rate or accuracy was taken from prediction accuracy performed by SVMs. Accuracy results as a function n number of features in ascending order are presented in Figure 7, for descending order are presented in Figure 8. We measured the performance for subsets consisting of the n ranked features. Where n varies between 1 and 34, started from the least important features for ascending order and from the most important features for descending order.
The SVM perform very well when all features or subsets of the original features are used. The peak accuracy was reached on the 19 until 22 best features in ranking by all techniques at accuracy of more or equal to 98.87%, and increasing the subsets did not improve the accuracy. Then the rest of the features can be deleted due to non-significant influence for the performance. Interestingly, the GR technique show the peak at accuracy of 98.93% (as shown at Table 5), the highest accuracy achievable using the five techniques.   Figure 7 shows the degradation in the recognition rate or accuracy when the number of features subsets is reduced. A comparison of the five methods shows that the accuracy over 90% achieved with RF subsets are better than another results (see Table 4). All the techniques show the same behavior without any significant differences. The accuracy is almost same until the subsets are reduced to 26 or less features, then the accuracy tends to decrease with reducing the subsets (see Table 4 and Figure 7). Figure 7. Accuracy for gamelan dataset as a function of the worst n ranked features (ascending order) Figure 8. Accuracy for gamelan dataset as a function of the best n ranked features (descending order) For descending order, the accuracy is quite stable until the subsets reduced to 7 or less features. The seven features are fundamental frequency, spectral roll off 40%, spectral centroid, spectral roll off 90%, spectral flux, spectral kurtosis and spectral skewness. The first best feature give accuracy of 53.96%, the second best features give 66.95%, the third best features give 72.03%, and the seven best features give accuracy of 96.55% (as shown at Figure 8).

Conclusion
In this paper, we have presented in details our approach to perform feature ranking using five filter-based ranking methods. Although they all perform in a similar way, accuracy of the SVM classifier has been significantly influenced by the feature ranking. It shows that Gain Ratio (GR) technique gave better result than the other four techniques. The highest accuracy 98.93% for GR was reached using the 21 best features.
Five filter-based ranking methods have been evaluated. The first seven features predicted by the five techniques gives the same results. The first seven features are: fundamental frequency, spectral roll off 40%, spectral centroid, spectral roll off 90%, spectral flux, spectral kurtosis and spectral skewness. Those features give accuracy of 96.55% for gamelan instrument identification.