Chapter 12. Secure Data Science on AWS

It is important to maintain least-privilege security at all layers, from network to application, and throughout the entire data science workflow, from data ingestion to model deployment. In this chapter, we reinforce that security is the top priority at AWS and often called “job zero” or “priority zero.” We discuss common security considerations and present best practices to build secure data science and machine learning projects on AWS. We will describe preventive controls that aim to stop events from occurring as well as detective controls to quickly detect potential events. We also identify responsive and corrective controls that help to remediate security violations.

The most common security considerations for building secure data science projects in the cloud touch the areas of access management, compute and network isolation, and encryption. Let’s first discuss these more general security best practices and security-first principles. Then we will apply these practices and principles to secure our data science environment from notebooks to S3 buckets using both network-level security and application security. We also discuss governance and audibility best practices for compliance and regulatory purposes.

Shared Responsibility Model Between AWS and Customers

AWS implements the shared responsibility model, through which they provide a global secure infrastructure and foundational compute, storage, networking and database services, as ...

Get Data Science on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.