[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Journal of Information Science and Engineering, Vol. 38 No. 3, pp. 697-711

Discovering Entity Columns of Web Tables Effectively and Efficiently

School of Computer and Information Technology
Beijing Jiaotong University
Beijing, 100044 P.R. China
E-mail: ChenSY725@126.com; nwang@bjtu.edu.cn+

Compared with traditional relational tables, web tables have no designated key attributes or entity columns, which make them difficult for machines to understand. The effec-tiveness of existing methods for entity column detection usually depends on the coverage of knowledge base, and efficiency of traversing knowledge base is low. In this paper, we propose a novel framework for discovering entity columns in web tables based on approximate primary functional dependency. We build the table schema dependency graph to reflect semantic dependency relationships between columns of a web table. By calculating the importance of each attribute node in the table schema dependency graph iteratively based on LeaderRank, our method can detect entity columns accurately and efficiently for both single-entity tables and multi-entity tables. The experimental results on real web datasets show that our method significantly outperforms previous work in both effectiveness and efficiency, especially for large tables.

Keywords: web table, entity column, functional dependency, table schema dependency graph, LeaderRank

  Retrieve PDF document (JISE_202203_12.pdf)