JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]


Journal of Information Science and Engineering, Vol. 31 No. 1, pp. 315-330


A New Experience in Persian Text Clustering using FarsNet Ontology


MOHAMMAD ZANJANI1,2, AHMAD BARAANI DASTJERDI3, EHSAN ASGARIAN4, ALIREZA SHAHRIYARI5 AND AMIR AKHAVAN KHARAZIANM2
1Department of Information and Communication Technology
South Pars Gas Complex
Asalouyeh, I.R. Iran
2Department of Computer Engineering, School of Engineering
SheikhBahaee University
Isfahan, I.R. Iran
3Department of Computer Engineering, Faculty of Engineering
Isfahan University
Isfahan, I.R. Iran
4Department of Computer Engineering, Faculty of Engineering
Ferdowsi University of Mashhad
Mashhad, I.R. Iran
5Department of Computer Engineering and Mathematics, Faculty of Engineering
Kingston University
London, KT12EE UK
Email: mzanjani@shbu.ac.ir 

 


    Clustering through organizing large text corpora has a key role in an easy navigation and browsing of massive amounts of text data and in particular in search engines. The documents comparison using the conventional clustering techniques is based on the surface similarities of words or extracted morphemes. This leads to non-semantic clusters usually. In this paper, Farsi, also known as Persian, has been taken into account with regards to the fact that the amount of electronic Farsi texts are growing rapidly. The documents are enriched by using semantic relationships ¡V synonymy, hypernymy and hyponymy- extracted from FarsNet lexical ontology. A WSD procedure is proposed to decrease uncertainty. After preprocessing routines, three clustering algorithms including Bisecting K-means, LSI and PLSI based clustering is applied on the pre-categorized Persian Hamshahri corpus. Experimental results show the improvement of clustering quality when text data is enriched by the semantic relations especially using PLSI based approach. 


Keywords: text clustering, word sense disambiguation, semantic analysis, FarsNet lexical ontology, probabilistic latent semantic indexing

  Retrieve PDF document (JISE_201501_17.pdf)