JISE

In this paper, we propose a new approach to reducing redundancy in the answers to a keyword query over large graph databases. Aiming to generate query results which are not only relevant but also has diverse structures and content nodes, we propose a method to find top-k answer sub-trees which should be in reduced forms and duplication-free in regard to the set of content nodes. To process keyword queries efficiently over large graph data, we suggest an efficient indexing scheme on the most relevant paths from nodes to keyword terms in the graph. We present a top-k query processing algorithm which exploits the pre-constructed indexes to search for a set of most relevant and non-redundant answers. We also provide a state space search algorithm to find most relevant duplication-free answers in an efficient way. We show effectiveness and efficiency of the proposed approach in comparison with the previous methods using extensive experiments on real graph datasets.