Natural automatic musical note player using time-frequency analysis on human play

This research aims to develop an automatic gamelan musical note player that can naturally play musical note as human does. A musician estimates time to hit an instrument button in an approximate time which is as close as to the target time. The tolerated time to play a note was identified based on the human play. A gamelan musician was selected to play five note sequences of songs, and the play was recorded to be analyzed. Execution time in hitting instrument buttons in human play was identified using time-frequency analysis and peak detection to define time range which can be tolerated as time value that not too fast or not too late in hitting buttons, and then the result of the analysis was used as parameters to randomize approximate time to play a note. The evaluation shows that the program played all note sequences in the approximate time as human does and the program played more natural and better than another program which played a note as exact as its time target.


Introduction
Computer music has succeeded transforming conventional music instruments into computer media that gives advantages to access, learn or play a music instrument. Computer music resulted from continuous researches and experiments involves computer which plays part of a composition, performs music, and gives more part of computer in an application related to music [1,2]. Automatic musical note player application is a part of computer music research, and has been implemented in various types of music applications to give a music demonstration, music games and learning music. For example, Smule, a phenomenal virtual music instruments application by Ge Wang, that turns iPhone becoming a creative channel to learn, play and perform music [3]. Another application example is Smart Gamelan Player designed to support learning of how to play traditional music instruments [4]. This application has a feature that shows gamelan music instruments demonstration in playing a note sequence of a song.
The task of automatic musical note player system is to hit an instrument button to play a note sequence based on a defined tempo represented in a time interval value between two consecutive beats. Tempo is measured in beats per minute or a number of beats to play in a minute. Each note in a note sequence has its time target reffered to a time interval value. Time target is used as parameter for an automatic musical note player system to play. Time target parameterization becomes a challenge in developing automatic musical note player for orchestra music. An automatic musical note player system for this type of music will result stiffly sound if each instrument of the orchestra plays every note in a note sequence in an exact time target, and this makes a system lacks of natural touch of human in playing music, such in a work by [4]. Hence, automatically hitting an instrument button to play a note should not be executed as exact as its time target but it should refer to human play that hits an instrument button based on an approximate time.
Approximate time is a range of time based on a time target in which hitting an instrument button can be tolerated as not too fast or not too late. In developing automatic musical player system that refers to human play, approximate time value should be considered as more than a simple random value. It is interesting to be investigated especially for developing a virtual music orchestra, and it becomes a motivation in this research to develop an automatic musical note player for traditional music orchestra called gamelan.
Song in gamelan music known as gendhing consists of a bar sequence. A bar called gatra contains four beats, in which each beat is initially represented with dot note called pin. The dot note then can be filled with a pitch or remain in the form of dot note. Rhythm in gamelan music is represented with a number of notes in a beat, and the speed in playing the musical note [5]. There are five levels of rhythms in gamelan music, which are lancar (1/1) where a beat contains one note, tanggung (1/2) where a beat contains two notes, dados (1/4) where a beat contains four notes, wiled (1/8) where a beat contains eight notes and rangkep (1/16) where a beat contains sixteen notes. The tempo to play is defined based on a constant interval time value. For example, the rhythm of lancar, the time interval value is about 1000 ms between two consecutive beats. Figure 1 shows an illustration of a note sequence consisting of 32 beats in a gamelan song entitled Suwe Ora Jamu. Instead of using a time interval value for the system to play a note, a random technique parameterized based on an observation on how a gamelan musician playing an instruments was proposed in this research. This was to obtain natural touch in playing a note sequence as human does. The parameterization for randomizing time as an approximate time to hit a button was defined by a set of time range value. Given a name P for a function to hit a button to play a note in which P identifies a set of time range RT to define an approximate time AT to execute the play, so the notation is P: RT→AT. Time range RT is defined based on human play using time-frequency analysis and peak detection.
Time-frequency analysis is commonly used for music signal problems, such as to analyze the sound of musical instruments [6], to analyze the interaction of musical rhythm with melodic structure [7], to analyze various frequency representation of the tone of a traditional instrument from Nusa Tenggara Timur, Indonesia called gong [8]. Time-frequency analysis known as spectrograms handles changes in frequency content over time, provides a time-frequency portrait of musical sounds, can describe quick transitions between notes problems [6]. The function in fourier analysis can be represented in both time and frequency domain known as time-frequency analysis [9,10]. Fourier transform and peak detection were used in time-frequency analysis by [11] for audio segmentation. Peak detection is to identify a time value when human hitting an instrument button. Tempo and beat estimation research was conducted by [12][13][14]. Spectral analysis, spectral energy flux and peak detection function were used by [12], while k-NN regression was used by [13] and utilization of harmonic separation and periodicity analysis were used by [14] to estimate tempo and beat. These researches aim to define a tempo and to track beats of a music audio source, in which the result can be used to identify a music genre. The methods used in these researches were quite similar with the procedures needed in this work. In this work, tempo and beat estimation were used to observe the way a musicion estimating a tempo in playing a music instrument. Fourier transform and peak detection used in time-frequency analysis by [11,12] for audio segmentation were used in this research. Peak detection is to identify a time value when human hitting an instrument button, and then the time value was calculated based on a time target to set a tolerated range of time (RT) used for randomizing approximate time (AT) to play a note.

Research Method
Natural automatic musical notes player was developed by analyzing the way a musician estimating execution time referred to the time target of the tempo. Execution time estimation 237 was represented to approximate time containing a time range value which can be tolerated by time target of the tempo, and the program executes the play of a music instrument in an approximate time. An illustration of a relation between notes sequence NS containing notes S, time target TT containing time value T, and execution time ET containing time value E is shown by Figure 2. The first note (S1) has time target first time target value (T1), and the first execution time (E1) will have time value which is to T1 more or less, and so on. Further, the value of ET was used to define approximate time AT. Figure 2. Illustration of relation between notes sequence, time target, execution time, and approximate time in playing music instrument Execution time was measured by analyzing recording audio of musicians playing a music instrument. The problem in this phase came up with selection of a method to accurately identify execution time of keystroke of hitting an instrument button. The analysis was to obtain data of pitch and time of musical notes played by a musician. Time-frequency analysis was used to accurately find execution time of an interaction of a musician and his instrument music in playing notes sequence. Time-frequency analysis is a method that represents visualization of sound in form of spectrogram. In musical sound analysis domain, spectrogram describes transition between notes including its time. Figure 3 shows the model diagram proposed to develop natural automatic musical notes player. The experiment in this research was conducted by developing an automatic gamelan musical note player limited into the rhythm of lancar and melodic abstraction note played using melodic abstraction instruments such as saron, demung, peking and slenthem. The proposed method was divided into data acquisition, pitch and time analysis and implementation.

Data Acquisition
The experiment used instrument of gamelan music for developing automatic musical notes player. There are two types of musical scale in gamelan music, which are slendro which consists of five pitches of (1, 2, 3, 5, 6) and pelog which consist of seven pitches of (1, 2, 3, 4, 5, 6, 7). The frequency of pitches of these types of musical scale is different, and the frequency of pitches of gamelan music is different with that in western music. A set of gamelan orchestra can contain both gamelan slendro and gamelan pelog, or only one of these types. Both types of gamelan musical scale consist of instruments classified based on their function, which are a group of instruments that plays notes sequence of melody skeleton of songs, such as demung, saron, peking, and slenthem; a group of instruments that plays notes of songs, such as gender and rebab; a group of instruments that plays notes which have function to define the structure of a song, such as kenong and gong.
The development of automatic musical notes player was limited to demung, an instrument of gamelan pelog that plays notes sequence of melody skeleton of songs. Demung is a type of xylophone instrument that has a register containing range of notes which are (1, 2, 3, , 5, 6, 7). Figure 4 shows an illustration of demung. Notes sequence of a melody skeleton are played in a constant interval time, such as a beat per second for slow tempo, or a beat per half a second for tempo that is faster than slow. For an example, a song entitled "Suwe Ora Jamu" (Figure 1) consists of 32 beats filled with dot notes and number notes. The use of slow tempo with one beat per second defines the first beat filled with a dot note has time target at 1 st second, the second beat filled with a note of 2 has time target at 2 nd seconds, the third beat filled with a dote note has time target at 3 rd seconds, the fourth beat filled with a note of 3 has time target at 4 th seconds, and so on. This makes time target to play all beats in the song is 32 seconds.
Data acquisition was to obtain audio signal data of gamelan song played by a musician in the rhythm of lancar where the time interval value is about 1000ms between two consecutive beats. The acquisition was conducted by recording a gamelan musician playing saron, one of melodic abstraction instruments. The play was recorded into .wav file format with sample rate of 48 khz and 16 bit mono. The musician played a note sequence of five gamelan songs collected from www.gamelanbvg.com. The gamelan songs used as dataset were described in Table 1. The musical note structure of gamelan song can be seen in Figure 5 that depicts the musical note of the first song in the dataset.

Pitch and Time Analysis
Time interval value was used as target time for a gamelan musician knowing when a button should be hit to play a note. Human plays it based on intuition or musical sense. This behavior show the way a musician estimating time to play in an approximate time which is as close as to the target time. Time-frequency analysis was used to identify the approximate time from a gamelan musician while playing a note sequence. The analysis was conducted by performing fast fourier transform (FFT) technique to remove noise, and then followed by performing peak detection technique to find the exactly time value of an approximate time from the gamelan musician play. The distance of execution time resulted from peak detection to the time target was measured to set range time used to parameterize a random value for approximate time value. Figure 5 shows an example of data input and data after noise removal using FFT which are then used to identify peaks values. The distance of execution time resulted from peak detection to the time target was measured to set range time used to parameterize a random value for approximate time value. For example, given a time interval value at 1000ms (rhythm of lancar) and the play was started from time value at 0, so the target time for each note can be formulated as: Execution time was identified based on the peak value resulted from peak detection. The formula to find execution time value is shown as follow: where: ET=Execution time PT=Peak time L=Number of beats Time range setting was used as parameters to random a time value to play a note. The data of execution time are sorted to find the lowest and the highest approximate time value. The lowest value will be negative value when the gamelan musician playing a note before target time. The highest value will be positive value when a gamelan musician playing a note after target time. The lowest and highest execution time value were used to parameterize a random value for the system playing a note in every time target, in which each value of time target was summed with the lowest value and the highest value. Below is the formula of time range setting. Based on the example above, the approximate time AT to play the third note in a note sequence with time target of 2000 ms is a time in a time range RT of 1969 to 2150 ms, thus the system will randomize the approximate time value based on this time range.
The formulas above were implemented in the experiment. A gamelan musician who has 27 years experience in playing gamelan instruments was selected to play note sequences of five gamelan songs in the form of melodic abstraction using a melodic abstraction instrument called saron. The play of each song was separately recorded, and then the data were analyzed using Matlab to conduct time-frequency analysis and peak detection. Table 2 shows the result of time-frequency analysis for five songs used in the dataset in which each song was played in the rhythm of lancar with time interval for two consecutive beats at 1000 ms. The columns in the table are labeled with GS which stands for index of gamelan songs in the dataset, IN for index of notes, NS for note sequence, TT for time target, PT for peak time and ET for execution time.
There were 312 peaks values of execution time obtained from five songs. The execution time values from all samples were concatenated and sorted to find lowest and highest values of execution time as seen below with DA stands for result of data of time-frequency analysis, DC stands for result of data concatenation, and DS stands for result of data sorting.  The result of the lowest value was -328, and the highest value was 246. The result was used for the range time as parameters to randomize approximate time for playing a note sequence. Sounds of notes of instrument buttons and pictures of instrument buttons visualization were embedded into the program. The program will call each sound and each picture according to a note which was played on the execution time. The program was designed to automatically play notes sequences inputted into the system. Total 30 notes sequences of melody skeleton including notes sequences used as dataset were used for the songs library in the program. All notes sequences were formatted as array data, and the dot notation was translated into value 0. Figure 8 shows the illustration of a notes sequence input formatted as array data. Array data of notes sequences used in songs library of the program were described in Table 3. [0, 2, 0, 3, 0, 2, 0, 3, 0, 1, 0, 2, 0, 3, 0, 2, 0, 3, 0, 5, 0, 6, 0, 5, 0, 4, 0, 2, 0, 1, 0, 6]

Results and Analysis
An automatic gamelan musical note player mobile application was developed using Adobe Flash program. The interface design of automatic musical notes player consisted of two main parts, which were songs library which contained title songs and its notes sequence, and play section which automatically presented the play of selected musical notes sequence including outputs of sound and instrument visualization. Figure 9 shows the interface design of automatic musical notes player for gamelan instrument named demung. For an addition, the program was designed to display a note sequence data and play it instrument animation and sounds as outputs, and three melodic abstraction instruments were used in the program which were saron, demung and slentem. Figure 10 shows the screenshot of the automatic gamelan musical note player application. Figure 10. Screenshot of automatic gamelan musical note player program The evaluation was simply conducted by recording the automatic play by the program using note sequences of five song samples. The recording audio files were analyzed using procedures implemented in the phase of data analysis which were time-frequency analysis and peak detection. The expectation was that all notes were played by the program in approximate time as the range time obtained from observing a gamelan musician play which were random value between (-328, 246) as addition for time target value. The result shows that all notes were played in approximate time as expected. Table 4 shows the execution times in playing the instrument by the program developed in this research denoted with Program I, and by the program developed by [4] denoted as Program II as the comparation. The columns in the table are labeled with IN which stands for index of notes, NS for note sequence, TT for time target and ET for execution time.
tolerated time was formulated into P: RT→AT. The value of time range (RT) was defined based on human play using time-frequency analysis. The time range was used as parameters to random approximate time for playing a note. After observing plays by a musician and conducting pitch and time analysis, the approximate time AT value was a random value of time range RT which was a value between-328 until 246 milliseconds from the time target value. An automatic gamelan musical note player program was developed based on this function. The evaluation shows that the program played all note sequences in the approximate time as human does. Based on the expert opinions, the program developed in this research played sounds more natural and better than Smart Gamelan Player developed by [4].
For the next works, the results of this research will be used to develop a model of musical notes reader, a program that can read play a musical sheet, and to develop an interactive music instrument program which can detect and measure the tempo of the play by the user.