JISE

The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns. Because mining frequent tree patterns has a wide range of applications such as XML mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) the computationally high cost of the candidate maintenance, (2) the repetitious input dataset scans, and (3) the high memory dependency. These problems stem from the fact that most of the algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in them. To solve the problems, we apply the pattern-growth approach instead of the apriori approach, and choose to extract maximal frequent subtree patterns rather than frequent subtree patterns. We present several new definitions and evaluate the effectiveness of the proposed algorithm.