JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]


Journal of Information Science and Engineering, Vol. 26 No. 2, pp. 505-525


An Automated Term Definition Extraction System Using the Web Corpus in the Chinese Language


FANG-YIE LEU AND CHIH-CHIEH KO
Department of Computer Science 
Tunghai University 
Taichung, 407 Taiwan 
E-mail: leufy@thu.edu.tw


    This paper proposes a system, named DefExplorer, which analyzes the type of given Chinese terms, extracts term definitions from the Web, and selects answers from noisy Web pages. DefExplorer filters out invalid data with a semantic approach. Two types of candidate sets, common and domain specific, are employed to cluster similar candidates into groups. Different approaches are also deployed to evaluate candidates’ importance which is the key factor for selecting the best answers from retrieved candidates. Experimental results show that DefExplorer can effectively extract term definitions from the Web, especially for the definitions of out-of-vocabulary terms.


Keywords: definitions, web corpus, information extraction, Chinese language, text mining

  Retrieve PDF document (JISE_201002_11.pdf)