[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Journal of Information Science and Engineering, Vol. 39 No. 2, pp. 423-438

Entity Matching based on Attribute-aware and Multi-Perspective Similarity Measurement

School of Computer and Information Technology
Beijing Jiaotong University
Beijing, 100044 P.R. China
E-mail: {19120416; nwang}@bjtu.edu.cn

Entity matching (EM) identifies tuples from different data sources that refer to the same real-world entity. One of the main challenges of EM is attribute heterogeneity, that is, there are many different types of attributes in an entity. Present researches focus on using rules or neural networks to select similarity measures for different types of attributes. However, they select only one specific similarity measure for each attribute but ignore matching information from many other aspects. In addition, existing methods neglect the fact that different attributes have different contributions to final matching decision, and do not consider the influence of dirty data on matching results. In this paper, we propose an entity matching method based on attribute-aware and multi-perspective similarity measurement. Firstly, we propose a multi-perspective similarity measurement framework based on pre-trained language model DeBERTa to achieve the comprehensive multi-perspective similarity computation, which will capture the matching information from multiple perspectives such as literal, size and semantics. Secondly, we introduce an attribute attention mechanism to aggregate matching evidences from all aligned attributes according to the importance of each attribute for final matching decision. Finally, we use cross-attribute comparison to solve dirty data problems such as swap errors, and we further improve our model’s matching capability through injecting external entity knowledge. Experimental results show that our framework for entity matching outperforms state-of-the-art methods on multiple real-world data sets.

Keywords: entity matching, similarity measurement, data integration, deep learning, natural language processing

  Retrieve PDF document (JISE_202302_13.pdf)