An Optimum Database for Isolated Word in Speech Recognition System

Syifaun Nafisah, Oyas Wahyunggoro, Lukito Edi Nugroho

Abstract


Speech recognition system (ASR) is a technology that allows computers receive the input using the spoken words. This technology requires sample words in the pattern matching process that is stored in the database. There is no reference as the fundamental theory to develop database in ASR. So, the research of database development to optimize the performance of the system is required.  Mel-scale frequency cepstral coefficients (MFCCs) is used to extract the characteristics of speech signal and backpropagation neural network in quantized vector is used to evaluate likelihood the maximum log values to the nearest pattern in the database.  The results shows the robustness of ASR is optimum using 140 samples of data reference for each word with an average of accuracy is 99.95% and duration process is 27.4 msec.  The investigation also reported the gender doesn’t have significantly influence to the accuracy.  From these results it concluded that the performance of ASR can be increased by optimizing the database.


Keywords


Optimum, Database, ASR, Backpropagation, MFCCs

Full Text:

PDF

References


Koehn, P. Statistical Machine Translation. New York: Cambridge University Press. 2010.

Peng, L. A Survey of Machine Translation Methods. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2013; 11(12); 7125-7130.

Sujaini, H., Kuspriyanto, Arman, A. A., & Purwarianti, A. A Novel Part-of-Speech Set Developing Method for Statistical Machine Translation. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2014; 12(3); 581-588.

Santosa, P. I. Isolated Sign Language Characters Recognition. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2013; 11(3); 583-590.

Emillia, N. R., Suyanto, & Maharani, W. Isolated Word Recognition Using Ergodic Hidden Markov Models and Genetic Algorithm. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2012; 10(1); 129-136.

Shinde, R. B., & Pawar, V. P. Isolated Word Recognition System based on LPC and DTW Technique. International Journal of Computer Applications. 2012; 59(6); 1-4.

Goss, B. Listening as information processing. Communication Quarterly.1982; 30(4).

Polur, P. D., & Gerald, M. E. Effects of high-frequency spectral component in computer recognition of dysarthric speech based on Mel-cepstral stochastic model. Journal of Rehabilitation Research & Development (JRRD). 2005; 42(3); 363-372.

Doman, G. What To Do About Your Brain-injured Child. Square One Publishers. 2005.

Tiwari, V. MFCC and its applications in speaker recognition. International Journal on Emerging Technologies. 2010; 19-22.

Furui, S. Digital Speech Processing: Synthesis, and Recognition (2nd ed.). CRC Press. 2000.




DOI: http://dx.doi.org/10.12928/telkomnika.v14i2.2353

Article Metrics

Abstract view : 204 times
PDF - 296 times

Refbacks



Copyright (c) 2019 Universitas Ahmad Dahlan

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus, 9th Floor, LPPI Room
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120 ext. 4902, Fax: +62 274 564604

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

View TELKOMNIKA Stats