JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]


Journal of Information Science and Engineering, Vol. 25 No. 2, pp. 591-601


Using Redundancy Reduction in Summarization to Improve Text Classification by SVMs


Jiaming Zhan and Han-Tong Loh
Department of Mechanical Engineering 
National University of Singapore 
Singapore 119260, Singapore


    In this paper, we investigate the use of summarization technique to improve text classification. As summarization inherently assign more weights to the more important sentences in an article, this may improve the accuracy of classification of the article. Redundancy in summaries was reduced to different levels and its effect on classification performance was investigated. The classification algorithm used here was Support Vector Machines (SVMs) which has proven to be very effective and robust for text classification problem. Experimental results showed that summaries with lowest redundancy could improve the classification performance of Reuters corpus with more than 6% increase on average F1 measure. In order to explain why summarization can improve the performance while feature selection makes no sense for SVMs, a further experiment was conducted to demonstrate the difference between summarization and traditional feature selection techniques.


Keywords: text classification, text summarization support vector machines maximal marginal relevance, text mining

  Retrieve PDF document (JISE_200902_16.pdf)