JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]


Journal of Information Science and Engineering, Vol. 27 No. 6, pp. 1787-1822


An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues


WOOSUNG JUNG1, EUNJOO LEE+ AND CHISU WU2
1Software Capability Development Center 
LG Electronics 
Seoul, 137-130 Korea 
2School of Computer Science and Engineering 
Seoul National University 
Seoul, 151-742 Korea 
+School of Computer Science and Engineering 
Kyungpook National University 
Daegu, 702-701 Korea


    It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the principal criteria in these approaches. Existing studies on web similarity focused on semantic or context similarity. Most of the existing clone detection techniques concentrated on general applications, not web applications. In this paper, WSIM has been suggested to measure similarity in web applications, based on the usage degree of clues and two linking directions. The similarity clues include page relations, source and target entities, and parameters. WSIM can be classified in three levels and two directions. Six kinds of WSIMs are defined, and each WSIM has its own purpose. Finally, several experiments were conducted on simulated data and real open sources to validate the proposed WSIM.


Keywords: web application, similarity, page clone, clues, maintainability

  Retrieve PDF document (JISE_201106_01.pdf)