[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Journal of Information Science and Engineering, Vol. 35 No. 3, pp. 651-674

Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents

1Department of Computer Science and Engineering
East China University of Science and Technology
Shanghai, 200237 P.R. China

2Business Intelligence and Visualization Research Center
National Engineering Laboratory for Big Data Distribution and Exchange Technologies
Shanghai, 200436 P.R. China

3School of Information Science and Technology
Shihezi University
Shihezi, 8320003 P.R. China 

Patent keywords, a high-level topic representation of patents, hold an important position in many patent-oriented mining tasks, such as classification, retrieval and translation. However, there are few studies concentrated on keywords extraction for patents in current stage, and neither exist human-annotated gold standard datasets, especially for Chinese patents. This paper introduces a new human-annotated Chinese patent dataset and proposes a sentence-ranking based Term Frequency-Inverse Document Frequency (SR based TF-IDF) algorithm for patent keywords extraction, motivated by the thought of “the keywords are in the key sentences”. In the algorithm, a sentence-ranking model is constructed to filter top-KS percent sentences from each patent based on a sentence semantic graph and heuristic rules. At last, the proposed algorithm is evaluated with TF-IDF, TextRank, word2vec weighted TextRank and Patent Keyword Extraction Algorithm (PKEA) on the homemade Chinese patent dataset and several standard benchmark datasets. The experimental results testify that our proposed algorithm effectively improves the performance of extracting keywords from Chinese patents. 

Keywords: Chinese patents, key sentences, sentence-ranking model, keywords extraction, human-annotated dataset

  Retrieve PDF document (JISE_201903_10.pdf)