An alignment comparator for entity resolution with multi-valued attributes

© Springer International Publishing Switzerland 2014.Entity matching is a problem that concerns many data management processes. If we consider matching between entities represented by RDF individuals we might find attributes values lists with variable-length for some properties, which will lead us to the problem of comparing multi-valued attributes, e.g. comparing author names lists for determining publication matching. This matching technique would be more complex than comparing fixed-length records, but less complex than comparing XML documents. Instead of comparing a single string, representing the concatenation of these values, each value of one vector should be compared against all values of the other vector. We propose a set of heuristics to address the alignment and comparison process of multi-valued attributes and evaluate them in the context of bibliographic databases. Our first results show that it is possible to reduce the comparisons amount and provide an aggregated similarity metric that outperforms the average similarity of cross product comparisons.

An alignment comparator for entity resolution with multi-valued attributes Chapter in Scopus