Chapter 5. Privacy-Aware Machine Learning and Data Science

In Chapter 4, you learned several different attacks, including attacks on the machine learning models themselves. You might have never thought about how to protect machine learning models from exposing private information; you presumed or heard that regularization and generalization would remove the privacy risk. Unfortunately, that is not the case, especially as models grow in size and parameters.

In this chapter, you’ll explore ways to add anonymization to machine learning workflows and dive into research on privacy-preserving machine learning and data science. As this field is fast-moving and actively being researched, you’ll want to understand the core concepts and compare today’s leading methods. You’ll review an open source library that is readily available and examine how to integrate the methods learned in previous chapters into your normal data science experimentation and workflow. This will allow you to develop skills in evaluating mitigations and determining the best approach for your use case.

Using Privacy-Preserving Techniques in Machine Learning

Machine learning offers a workflow where privacy techniques can easily be incorporated. For most machine learning applications, you are already investigating the data, cleaning or wrangling the variables you would like to use, preparing the features, and applying training and testing. Because of the heavy involvement of data science professionals at each stage in ...

Get Practical Data Privacy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.