JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]


Journal of Information Science and Engineering, Vol. 22 No. 5, pp. 1059-1075


A Talking Face Driven by Voice using Hidden Markov Model


Guang-Yi Wang, Mau-Tsuen Yang, Cheng-Chin Chiang and Wen-Kai Tai
Department of Computer Science and Information Engineering 
National Dong Hwa University 
Hualien, 974 Taiwan 
E-mail: mtyang@mail.ndhu.edu.tw


    In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism between two different kinds of correlated signals. Specifically, we developed a voicedriven talking head system by exploiting the physical relationships between the shape of the mouth and the sound that is produced. The proposed system can be easily trained and a talking head can be efficiently animated. In the training phase, the Mel-scale Frequency Cepstral Coefficients (MFCC) were analyzed from audio signals and the Facial Animation Parameters (FAP) were extracted from video signals. Then both audio and video features were integrated to train a single HMM. In the synthesis phase, the HMM was used to correlate a completely novel audio track to a FAP sequence for face synthesis with the help of Facial Animation Engine (FAE). The experiments demonstrated the effects of the proposed voice-driven talking head on both man and woman, with two kinds of styles (speaking and singing) and using three kinds of languages (Chinese, English and Taiwanese). The possible applications of the proposed system are computer aided instruction, online guide, virtual conference, lip synchronization, human computer interaction and so on.


Keywords: talking head, audio-to-visual mapping, HMM, FAP, lip synchronizatio

  Retrieve PDF document (JISE_200605_05.pdf)