JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]


Journal of Information Science and Engineering, Vol. 26 No. 4, pp. 1491-1507


Development of a Mandarin-English Bilingual Speech Recognition System with Unified Acoustic Models


QING-QING ZHANG, JIE-LIN PAN AND YONG-HONG YAN
ThinkIT Speech Laboratory 
Institute of Acoustics 
Chinese Academy of Sciences 
Beijing, 100190 China


    This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real-world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition for realworld applications are tackled: One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. A unified bilingual acoustic model, which is derived by the novel Two-pass phone-clustering method based on the Confusion Matrix (TCM), is developed to solve the first problem. To deal with the second problem, several nonnative model modification approaches are investigated on the unified acoustic models. Compared to the existing log-likelihood phone-clustering method, the proposed TCM method with effective incorporation of limited amounts of nonnative adaptation data and adaptive modification, relatively reduces the Phrase Error Rate (PER) by 10.9% for nonnative English phrases and the PER on Mandarin phrases decreases favorably, and besides, the recognition rate for bilingual code-mixing phrases achieves an 8.9% relative PER reduction.


Keywords: bilingual speech recognition, two-pass phone clustering, confusion matrix, nonnative adaptation, model retraining

  Retrieve PDF document (JISE_201004_21.pdf)