JISE

Mimicking human auditory systems as well as applying mean normalization in feature extraction are widely believed to improve the robustness of speech recognition. Traditionally, the normalization is conducted in the log domain by subtracting the features with their long-term mean. Some studies have found that the use of power functions instead of log yield more robust features. In previous studies, a q-logarithmic function (q-log), which is also a power function, was used to derive a normalization method. The method, called q-mean normalization (q-MN) in this paper, was found more effective than conventional normalization methods. In these works, q-MN was still applied in the power spectral domain. Here, the method is applied after mapping the power spectra on human auditory systems, and, after an analysis on the effect of the method on noisy speech, we propose a blind and adaptive normalization technique to determine a suitable q in q-MN. The experiments show that the proposed features are more robust than conventional features such as MFCC. The results also confirm that using nonlinear resolutions inspired by human auditory systems benefits speech recognition and is better than using a uniform resolution.