[ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ] [ 18 ]

Journal of Information Science and Engineering, Vol. 36 No. 2, pp. 347-363

Graph-Based Extractive Arabic Text Summarization Using Multiple Morphological Analyzers

1Department of Computer Science
2Department of Mathematics
Damietta University
New Damietta, 0020 Egypt
E-mail: {elbarougy; gbehery}@du.edu.eg; akram_elkhatib@hotmail.com​

This paper investigates the effectiveness of using multi-morphological analysis for improving the performance of graph-based approach for extractive Arabic text summarization (ATS). This approach represents the text-document as a graph in which; sentences are the graph nodes and the relationships between the sentences are edges' weights of the graph. These weights measure the similarity between the relevant sentences which traditionally calculated using the cosine similarity on the basis of term frequency-inverse document frequency (TF-IDF). The performance of graph-based ATS is still low because calculating these weights are very challenging for Arabic language due to the following reasons: complex morphological structure of Arabic language, absence of capital letters and diacritics, and the change of the order of the words on the sentence. In this study, the summation of the cosine similarity and mutual nouns between the connected sentences is chosen as measure to represent the edges' weights. Nouns were chosen because, the more nouns in the sentence the more information is, thus we assume that using nouns lead to an improvement in the final summary. To overcome Arabic language limitations when calculating the proposed measure, it is required to investigate the impact of using different morphological analyzers for extracting nouns from each sentence on ATS accuracy. Three morphological analyzers algorithms are proposed to enhance the performance of graphbased ATS system. These algorithms are: BAMA, Safar Alkhalil and Stanford NLP. Firstly, graph-based ATS system was constructed the input of this system is text-document and the output are summary. Then redundant sentences were removed according to sentences overlapping criteria. To evaluate the impact of different morphological on the proposed summarization approach, EASC corpus is used as a standard dataset. The results show that Safar Alkhalil morphological analyzer gives the best performance among the three proposed analyzers.

Keywords: Arabic text summarization, morphological analyzer, natural language processing, graph based, minimum spanning tree

  Retrieve PDF document (JISE_202002_13.pdf)