JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]


Journal of Information Science and Engineering, Vol. 35 No. 1, pp. 87-104


On the Effect of the Implementation of Human Auditory Systems on Q-Log-Based Features for Robustness of Speech Recognition Against Noise


HILMAN F. PARDEDE, ASRI R. YULIANI AND AGUS SUBEKTI*
Research Center for Informatics
Indonesian Institute of Sciences
Bandung, 40135 Indonesia
E-mail: {hilm001; asri006; agus075}@lipi.go.id


Mimicking human auditory systems as well as applying mean normalization in feature extraction are widely believed to improve the robustness of speech recognition. Traditionally, the normalization is conducted in the log domain by subtracting the features with their long-term mean. Some studies have found that the use of power functions instead of log yield more robust features. In previous studies, a q-logarithmic function (q-log), which is also a power function, was used to derive a normalization method. The method, called q-mean normalization (q-MN) in this paper, was found more effective than conventional normalization methods. In these works, q-MN was still applied in the power spectral domain. Here, the method is applied after mapping the power spectra on human auditory systems, and, after an analysis on the effect of the method on noisy speech, we propose a blind and adaptive normalization technique to determine a suitable q in q-MN. The experiments show that the proposed features are more robust than conventional features such as MFCC. The results also confirm that using nonlinear resolutions inspired by human auditory systems benefits speech recognition and is better than using a uniform resolution.


Keywords: q-logarithm, robust speech recognition, feature normalization, human auditory, adaptive

  Retrieve PDF document (JISE_201901_05.pdf)