Chapter 9. Differentially Private Machine Learning

Machine learning (ML) is the process of learning relationships and patterns in a data set. Statistical modeling, as discussed in Chapter 8, places greater emphasis on model interpretability. This difference happens to form a natural division in DP techniques.

ML model parameters can leak information about the training data, just as they can in statistical modeling. When you privately train a model, your goal is to release parameters/weights for the model that accurately capture the relationship between variables while protecting your sensitive data with the guarantees of differential privacy.

In this chapter, you will learn about a variety of techniques that are typically used to privately train ML models. Stochastic gradient descent (SGD) is a focal point, as it is the workhorse of non-DP ML training.

The content of this chapter assumes both a working knowledge of non-DP ML and relies heavily on concepts introduced in previous chapters: Chapters 3, 4, 5, and 6. While this may seem daunting, the chapter will start with a more approachable minimum viable DP-SGD before gradually mixing in more advanced tools.

The chapter ends with a discussion and examples of frameworks and tools that will help you create DP ML models. Before diving in, we’ll first motivate the use of DP in this domain by discussing privacy attacks.

Why Make Machine Learning Models Differentially Private?

Suppose you are running a company that sells online educational ...

Get Hands-On Differential Privacy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.