JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]


Journal of Information Science and Engineering, Vol. 41 No. 2, pp. 419-434


Boundary Recognition Method for Preposition Phrases in Modern Chinese based on Top-N Algorithm


HUI-FANG ZHANG
1School of Literature and Media
Sichuan University Jinjiang College
Pengshan, 620860 P.R. China
E-mail: zhanghuifang2035@163.com


In modern Chinese sentences, the structure and usage of prepositional phrases are very flexible and diverse, which makes it difficult to mine the topic of prepositional phrases in modern Chinese, and leads to the decline of the accuracy of prepositional phrase boundary recognition. Therefore, the paper studies the boundary recognition method for preposition phrases in modern Chinese based on Top-N algorithm. Firstly, a prepositional phrase corpus is constructed through a prepositional usage attribute dictionary, a prepositional usage rule base and a prepositional usage corpus. Then, based on the corpus of prepositional phrases, the topic of prepositional phrases in modern Chinese is mined by LDA model, and the modern Chinese documents are segmented. Conditional random field model is used to extract prepositional phrase features from segmented modern Chinese documents. Finally, the recognition of prepositional phrase boundaries in modern Chinese is realized based on local model weighted fusion Top-N algorithm. The experimental results show that in the three types of prepositional phrase environments, the highest accuracy of prepositional phrase boundary recognition achieved by the research method is 99.6%, and the highest F-value is 99.7%. On the surface, this method can accurately identify the boundaries of modern Chinese prepositional phrases and is practical.


Keywords: Top-N algorithm, modern Chinese, preposition phrase, boundary recognition method, LDA model, conditional random field model

  Retrieve PDF document (JISE_202502_09.pdf)