38th IEEE International Conference on Data Engineering (ICDE), ELECTR NETWORK, 9 - 11 Mayıs 2022, ss.634-647, (Tam Metin Bildiri)
This paper proposes a notion of parametric simulation to link entities across a relational database D and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations and important properties as parameters, parametric simulation identifies tuples t in D and vertices v in G that refer to the same real-world entity, based on topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time, by providing such an algorithm. Putting these together, we develop HER, a parallel system to check whether (t, v) makes a match, find all vertex matches of t in G, and compute all matches across D and G, all in quadratic-time. Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database D and graph G.