Chapter 11. Protecting Your Data Against Privacy Attacks

In 2006, AOL released an anonymized data set of search activity from its service. This sample contained 20 million queries made by more than 650,000 users over 3 months. Although the usernames were obfuscated, many of the search queries themselves contained personally identifiable information. This resulted in several users being identified and matched to their accounts and search history.1

This release led to the resignation of two senior staff members and a class action lawsuit that was settled in 2013. It also caused enormous harm to AOL’s public image and exposed the identities of real people who were using the service with the assumption that their privacy would be protected.

This chapter discusses attacks on data releases and how differential privacy can protect against them. While Chapters 1 and 9 briefly touched on privacy attacks, this chapter discusses a much wider variety of attacks in greater detail. The ramifications of each type of attack are also discussed: attacks may be used to reconstruct an individual’s data, or they may be used to infer if an individual exists in a data set.

The attacks are explained from the perspective of two parties: a data analyst and a data curator. To ensure the protections on the data are robust enough to protect the privacy of individuals, assume that the data analyst harbors the worst intentions: the analyst is an adversary who is determined to violate the privacy of individuals ...

Get Hands-On Differential Privacy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.