Chapter 10. Privacy-Preserving Record Linkage

In previous chapters, we have seen how to resolve entities via exact and probabilistic matching techniques, using both local compute and cloud-based solutions. The first step in these matching processes is to assemble the data sources onto a single platform for comparison. Where the data sources to be resolved share a common owner, or can be freely shared in their entirety for the purposes of matching, then centralized processing is the most efficient approach.

However, data sources can often be sensitive, and privacy considerations may preclude unrestricted sharing with another party. This chapter considers how privacy-preserving record linkage techniques can be used to perform basic entity resolution across data sources held separately by two parties. In particular, we will consider private set intersection as a practical means to identify entities known to both parties without either side disclosing their full dataset to the other.

An Introduction to Private Set Intersection

Private set intersection (PSI) is a cryptographic technique that allows the intersection between two overlapping sets of information, held by two different parties, to be identified without revealing the nonintersecting elements to either counterparty.

For example, as shown in Figure 10-1, the intersection between Set A, owned by Alice, and Set B, owned by Bob, can be identified as comprising elements 4 and 5 without revealing Bob’s knowledge of entities 6, ...

Get Hands-On Entity Resolution now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.