JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]


Journal of Information Science and Engineering, Vol. 30 No. 5, pp. 1407-1421


Woodpecker: An Automatic Methodology for Machine Translation Diagnosis with Rich Linguistic Knowledge


BO WANG1, MING ZHOU2, SHUJIE LIU2, MU LI2 AND DONGDONG ZHANG2
1School of Computer Science and Technology
Tianjin University
Tianjin, 300000 P.R. China
2Microsoft Research Asia
Beijing, 100000 P.R. China
E-mail: bo.wang.1979@gmail.com; {mingzhou; shujieli; muli; dozhang}@microsoft.com

 


    Different from the “black-box” evaluation, the diagnostic evaluation aims to provide a better explanatory power into various aspects of the performance of artificial intelligence systems. However, for machine translation (MT) systems, due to its complexity and knowledge dependency, such diagnostic evaluation often demands a large amount of manual work. To tackle this problem, we propose an automatic diagnostic evaluation methodology, called Woodpecker, which enables multi-factored evaluation of MT systems based on linguistic categories and automatically constructed linguistic checkpoints. The taxonomy of the categories is defined with rich linguistic knowledge, including phenomena on different linguistic levels. The instances of the categories are composed into test cases called linguistic checkpoints. We present a method that automatically extracts checkpoints from parallel sentences, through which, Woodpecker can automatically monitor a MT system in translating various linguistic phenomena, thereby facilitating diagnostic evaluation. The effectiveness of Woodpecker is verified through in-house experiments and open MT evaluation tracks on various types of MT systems.

 


Keywords: evaluation, diagnosis, machine translation, linguistic knowledge, checkpoint

  Retrieve PDF document (JISE_201405_07.pdf)