Applied Machine Learning and High-Performance Computing on AWS

Book description

Build, train, and deploy large machine learning models at scale in various domains such as computational fluid dynamics, genomics, autonomous vehicles, and numerical optimization using Amazon SageMaker

Key Features

  • Understand the need for high-performance computing (HPC)
  • Build, train, and deploy large ML models with billions of parameters using Amazon SageMaker
  • Learn best practices and architectures for implementing ML at scale using HPC

Book Description

Machine learning (ML) and high-performance computing (HPC) on AWS run compute-intensive workloads across industries and emerging applications. Its use cases can be linked to various verticals, such as computational fluid dynamics (CFD), genomics, and autonomous vehicles.

This book provides end-to-end guidance, starting with HPC concepts for storage and networking. It then progresses to working examples on how to process large datasets using SageMaker Studio and EMR. Next, you'll learn how to build, train, and deploy large models using distributed training. Later chapters also guide you through deploying models to edge devices using SageMaker and IoT Greengrass, and performance optimization of ML models, for low latency use cases.

By the end of this book, you'll be able to build, train, and deploy your own large-scale ML application, using HPC on AWS, following industry best practices and addressing the key pain points encountered in the application life cycle.

What you will learn

  • Explore data management, storage, and fast networking for HPC applications
  • Focus on the analysis and visualization of a large volume of data using Spark
  • Train visual transformer models using SageMaker distributed training
  • Deploy and manage ML models at scale on the cloud and at the edge
  • Get to grips with performance optimization of ML models for low latency workloads
  • Apply HPC to industry domains such as CFD, genomics, AV, and optimization

Who this book is for

The book begins with HPC concepts, however, it expects you to have prior machine learning knowledge. This book is for ML engineers and data scientists interested in learning advanced topics on using large datasets for training large models using distributed training concepts on AWS, deploying models at scale, and performance optimization for low latency use cases. Practitioners in fields such as numerical optimization, computation fluid dynamics, autonomous vehicles, and genomics, who require HPC for applying ML models to applications at scale will also find the book useful.

Table of contents

  1. Applied Machine Learning and High-Performance Computing on AWS
  2. Contributors
  3. About the authors
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
    9. Download a free PDF copy of this book
  6. Part 1: Introducing High-Performance Computing
  7. Chapter 1: High-Performance Computing Fundamentals
    1. Why do we need HPC?
    2. Limitations of on-premises HPC
      1. Barrier to innovation
      2. Reduced efficiency
      3. Lost opportunities
      4. Limited scalability and elasticity
    3. Benefits of doing HPC on the cloud
      1. Drives innovation
      2. Enables secure collaboration among distributed teams
      3. Amplifies operational efficiency
      4. Optimizes performance
      5. Optimizes cost
    4. Driving innovation across industries with HPC
      1. Life sciences and healthcare
      2. AVs
      3. Supply chain optimization
    5. Summary
    6. Further reading
  8. Chapter 2: Data Management and Transfer
    1. Importance of data management
    2. Challenges of moving data into the cloud
    3. How to securely transfer large amounts of data into the cloud
    4. AWS online data transfer services
      1. AWS DataSync
      2. AWS Transfer Family
      3. Amazon S3 Transfer Acceleration
      4. Amazon Kinesis
      5. AWS Snowcone
    5. AWS offline data transfer services
      1. Process for ordering a device from AWS Snow Family
    6. Summary
    7. Further reading
  9. Chapter 3: Compute and Networking
    1. Introducing the AWS compute ecosystem
      1. General purpose instances
      2. Compute optimized instances
      3. Accelerated compute instances
      4. Memory optimized instances
      5. Storage optimized instances
      6. Amazon Machine Images (AMIs)
      7. Containers on AWS
      8. Serverless compute on AWS
    2. Networking on AWS
      1. CIDR blocks and routing
      2. Networking for HPC workloads
    3. Selecting the right compute for HPC workloads
      1. Pattern 1 – a standalone instance
      2. Pattern 2 – using AWS ParallelCluster
      3. Pattern 3 – using AWS Batch
      4. Pattern 4 – hybrid architecture
      5. Pattern 5 – Container-based distributed processing
      6. Pattern 6 – serverless architecture
    4. Best practices for HPC workloads
    5. Summary
    6. References
  10. Chapter 4: Data Storage
    1. Technical requirements
    2. AWS services for storing data
      1. Amazon Simple Storage Service (S3)
      2. Amazon Elastic File System (EFS)
      3. Amazon EBS
      4. Amazon FSx
    3. Data security and governance
      1. IAM
      2. Data protection
      3. Data encryption
      4. Logging and monitoring
      5. Resilience
    4. Tiered storage for cost optimization
      1. Amazon S3 storage classes
      2. Amazon EFS storage classes
    5. Choosing the right storage option for HPC workloads
    6. Summary
    7. Further reading
  11. Part 2: Applied Modeling
  12. Chapter 5: Data Analysis
    1. Technical requirements
    2. Exploring data analysis methods
      1. Gathering the data
      2. Understanding the data structure
      3. Describing the data
      4. Visualizing the data
      5. Reviewing the data analytics life cycle
    3. Reviewing the AWS services for data analysis
      1. Unifying the data into a common store
      2. Creating a data structure for analysis
      3. Visualizing the data at scale
      4. Choosing the right AWS service
    4. Analyzing large amounts of structured and unstructured data
      1. Setting up EMR and SageMaker Studio
      2. Analyzing large amounts of structured data
      3. Analyzing large amounts of unstructured data
    5. Processing data at scale on AWS
    6. Cleaning up
    7. Summary
  13. Chapter 6: Distributed Training of Machine Learning Models
    1. Technical requirements
    2. Building ML systems using AWS
    3. Introducing the fundamentals of distributed training
      1. Reviewing the SageMaker distributed data parallel strategy
      2. Reviewing the SageMaker model data parallel strategy
      3. Reviewing a hybrid data parallel and model parallel strategy
    4. Executing a distributed training workload on AWS
      1. Executing distributed data parallel training on Amazon SageMaker
      2. Executing distributed model parallel training on Amazon SageMaker
    5. Summary
  14. Chapter 7: Deploying Machine Learning Models at Scale
    1. Managed deployment on AWS
      1. Amazon SageMaker managed model deployment options
      2. The variety of compute resources available
      3. Cost-effective model deployment
      4. Blue/green deployments
      5. Inference recommender
      6. MLOps integration
      7. Model registry
      8. Elastic inference
      9. Deployment on edge devices
    2. Choosing the right deployment option
      1. Using batch inference
      2. Using real-time endpoints
      3. Using asynchronous inference
    3. Batch inference
      1. Creating a transformer object
      2. Creating a batch transform job for carrying out inference
      3. Optimizing a batch transform job
    4. Real-time inference
      1. Hosting a machine learning model as a real-time endpoint
    5. Asynchronous inference
    6. The high availability of model endpoints
      1. Deployment on multiple instances
      2. Endpoints autoscaling
      3. Endpoint modification without disruption
    7. Blue/green deployments
      1. All at once
      2. Canary
      3. Linear
    8. Summary
    9. References
  15. Chapter 8: Optimizing and Managing Machine Learning Models for Edge Deployment
    1. Technical requirements
    2. Understanding edge computing
    3. Reviewing the key considerations for optimal edge deployments
      1. Efficiency
      2. Performance
      3. Reliability
      4. Security
    4. Designing an architecture for optimal edge deployments
      1. Building the edge components
      2. Building the ML model
      3. Deploying the model package
    5. Summary
  16. Chapter 9: Performance Optimization for Real-Time Inference
    1. Technical requirements
    2. Reducing the memory footprint of DL models
      1. Pruning
      2. Quantization
      3. Model compilation
    3. Key metrics for optimizing models
    4. Choosing the instance type, load testing, and performance tuning for models
    5. Observing the results
    6. Summary
  17. Chapter 10: Data Visualization
    1. Data visualization using Amazon SageMaker Data Wrangler
      1. SageMaker Data Wrangler visualization options
      2. Adding visualizations to the data flow in SageMaker Data Wrangler
      3. Data flow
    2. Amazon’s graphics-optimized instances
      1. Benefits and key features of Amazon’s graphics-optimized instances
    3. Summary
    4. Further reading
  18. Part 3: Driving Innovation Across Industries
  19. Chapter 11: Computational Fluid Dynamics
    1. Technical requirements
    2. Introducing CFD
    3. Reviewing best practices for running CFD on AWS
      1. Using AWS ParallelCluster
      2. Using CFD Direct
    4. Discussing how ML can be applied to CFD
    5. Summary
    6. References
  20. Chapter 12: Genomics
    1. Technical requirements
    2. Managing large genomics data on AWS
    3. Designing architecture for genomics
    4. Applying ML to genomics
      1. Protein secondary structure prediction for protein sequences
    5. Summary
  21. Chapter 13: Autonomous Vehicles
    1. Technical requirements
    2. Introducing AV systems
    3. AWS services supporting AV systems
    4. Designing an architecture for AV systems
    5. ML applied to AV systems
      1. Model development
      2. Step 1 – build and push the CARLA container to Amazon ECR
      3. Step 2 – configure and run CARLA on RoboMaker
    6. Summary
    7. References
  22. Chapter 14: Numerical Optimization
    1. Introduction to optimization
      1. Goal or objective function
      2. Variables
      3. Constraints
      4. Modeling an optimization problem
      5. Optimization algorithm
      6. Local and global optima
    2. Common numerical optimization algorithms
      1. Random restart hill climbing
      2. Simulated annealing
      3. Tabu search
      4. Evolutionary methods
    3. Example use cases of large-scale numerical optimization problems
      1. Traveling salesperson optimization problem
      2. Worker dispatch optimization
      3. Assembly line optimization
    4. Numerical optimization using high-performance compute on AWS
      1. Commercial optimization solvers
      2. Open source optimization solvers
      3. Numerical optimization patterns on AWS
    5. Machine learning and numerical optimization
    6. Summary
    7. Further reading
  23. Index
    1. Why subscribe?
  24. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Applied Machine Learning and High-Performance Computing on AWS
  • Author(s): Mani Khanuja, Farooq Sabir, Shreyas Subramanian, Trenton Potgieter
  • Release date: December 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803237015