Linking Entities across Relations and Graphs


Fan W., Geng L., Jin R., Lu P., Tugay R., Yu W.

38th IEEE International Conference on Data Engineering (ICDE), ELECTR NETWORK, 9 - 11 Mayıs 2022, ss.634-647, (Tam Metin Bildiri) identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/icde53745.2022.00052
  • Basıldığı Ülke: ELECTR NETWORK
  • Sayfa Sayıları: ss.634-647
  • Atatürk Üniversitesi Adresli: Hayır

Özet

This paper proposes a notion of parametric simulation to link entities across a relational database D and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations and important properties as parameters, parametric simulation identifies tuples t in D and vertices v in G that refer to the same real-world entity, based on topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time, by providing such an algorithm. Putting these together, we develop HER, a parallel system to check whether (t, v) makes a match, find all vertex matches of t in G, and compute all matches across D and G, all in quadratic-time. Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database D and graph G.