The Machine Learning Solutions Architect Handbook - Second Edition

Book description

Design, build, and secure scalable machine learning (ML) systems to solve real-world business problems with Python and AWS Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Go in-depth into the ML lifecycle, from ideation and data management to deployment and scaling
  • Apply risk management techniques in the ML lifecycle and design architectural patterns for various ML platforms and solutions
  • Understand the generative AI lifecycle, its core technologies, and implementation risks

Book Description

David Ping, Head of GenAI and ML Solution Architecture for global industries at AWS, provides expert insights and practical examples to help you become a proficient ML solutions architect, linking technical architecture to business-related skills.

You'll learn about ML algorithms, cloud infrastructure, system design, MLOps , and how to apply ML to solve real-world business problems. David explains the generative AI project lifecycle and examines Retrieval Augmented Generation (RAG), an effective architecture pattern for generative AI applications. You’ll also learn about open-source technologies, such as Kubernetes/Kubeflow, for building a data science environment and ML pipelines before building an enterprise ML architecture using AWS. As well as ML risk management and the different stages of AI/ML adoption, the biggest new addition to the handbook is the deep exploration of generative AI.

By the end of this book , you’ll have gained a comprehensive understanding of AI/ML across all key aspects, including business use cases, data science, real-world solution architecture, risk management, and governance. You’ll possess the skills to design and construct ML solutions that effectively cater to common use cases and follow established ML architecture patterns, enabling you to excel as a true professional in the field.

What you will learn

  • Apply ML methodologies to solve business problems across industries
  • Design a practical enterprise ML platform architecture
  • Gain an understanding of AI risk management frameworks and techniques
  • Build an end-to-end data management architecture using AWS
  • Train large-scale ML models and optimize model inference latency
  • Create a business application using artificial intelligence services and custom models
  • Dive into generative AI with use cases, architecture patterns, and RAG

Who this book is for

This book is for solutions architects working on ML projects, ML engineers transitioning to ML solution architect roles, and MLOps engineers. Additionally, data scientists and analysts who want to enhance their practical knowledge of ML systems engineering, as well as AI/ML product managers and risk officers who want to gain an understanding of ML solutions and AI risk management, will also find this book useful. A basic knowledge of Python, AWS, linear algebra, probability, and cloud infrastructure is required before you get started with this handbook.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Navigating the ML Lifecycle with ML Solutions Architecture
    1. ML versus traditional software
    2. ML lifecycle
      1. Business problem understanding and ML problem framing
      2. Data understanding and data preparation
      3. Model training and evaluation
      4. Model deployment
      5. Model monitoring
      6. Business metric tracking
    3. ML challenges
    4. ML solutions architecture
      1. Business understanding and ML transformation
      2. Identification and verification of ML techniques
      3. System architecture design and implementation
      4. ML platform workflow automation
      5. Security and compliance
    5. Summary
  3. Exploring ML Business Use Cases
    1. ML use cases in financial services
      1. Capital market front office
        1. Sales trading and research
        2. Investment banking
        3. Wealth management
      2. Capital market back office operations
        1. Net Asset Value review
        2. Post-trade settlement failure prediction
      3. Risk management and fraud
        1. Anti-money laundering
        2. Trade surveillance
        3. Credit risk
      4. Insurance
        1. Insurance underwriting
        2. Insurance claim management
    2. ML use cases in media and entertainment
      1. Content development and production
      2. Content management and discovery
      3. Content distribution and customer engagement
    3. ML use cases in healthcare and life sciences
      1. Medical imaging analysis
      2. Drug discovery
      3. Healthcare data management
    4. ML use cases in manufacturing
      1. Engineering and product design
      2. Manufacturing operations – product quality and yield
      3. Manufacturing operations – machine maintenance
    5. ML use cases in retail
      1. Product search and discovery
      2. Targeted marketing
      3. Sentiment analysis
      4. Product demand forecasting
    6. ML use cases in the automotive industry
      1. Autonomous vehicles
        1. Perception and localization
        2. Decision and planning
        3. Control
      2. Advanced driver assistance systems (ADAS)
    7. Summary
  4. Exploring ML Algorithms
    1. Technical requirements
    2. How machines learn
    3. Overview of ML algorithms
      1. Consideration for choosing ML algorithms
      2. Algorithms for classification and regression problems
        1. Linear regression algorithms
        2. Logistic regression algorithms
        3. Decision tree algorithms
        4. Random forest algorithm
        5. Gradient boosting machine and XGBoost algorithms
        6. K-nearest neighbor algorithm
        7. Multi-layer perceptron (MLP) networks
      3. Algorithms for clustering
      4. Algorithms for time series analysis
        1. ARIMA algorithm
        2. DeepAR algorithm
      5. Algorithms for recommendation
        1. Collaborative filtering algorithm
        2. Multi-armed bandit/contextual bandit algorithm
      6. Algorithms for computer vision problems
        1. Convolutional neural networks
        2. ResNet
      7. Algorithms for natural language processing (NLP) problems
        1. Word2Vec
        2. BERT
      8. Generative AI algorithms
        1. Generative adversarial network
        2. Generative pre-trained transformer (GPT)
        3. Large Language Model
        4. Diffusion model
    4. Hands-on exercise
      1. Problem statement
      2. Dataset description
      3. Setting up a Jupyter Notebook environment
      4. Running the exercise
    5. Summary
  5. Data Management for ML
    1. Technical requirements
    2. Data management considerations for ML
    3. Data management architecture for ML
      1. Data storage and management
        1. AWS Lake Formation
      2. Data ingestion
        1. Kinesis Firehose
        2. AWS Glue
        3. AWS Lambda
      3. Data cataloging
        1. AWS Glue Data Catalog
        2. Custom data catalog solution
      4. Data processing
      5. ML data versioning
        1. S3 partitions
        2. Versioned S3 buckets
        3. Purpose-built data version tools
      6. ML feature stores
      7. Data serving for client consumption
        1. Consumption via API
        2. Consumption via data copy
      8. Special databases for ML
        1. Vector databases
        2. Graph databases
      9. Data pipelines
      10. Authentication and authorization
      11. Data governance
        1. Data lineage
        2. Other data governance measures
    4. Hands-on exercise – data management for ML
      1. Creating a data lake using Lake Formation
      2. Creating a data ingestion pipeline
      3. Creating a Glue Data Catalog
      4. Discovering and querying data in the data lake
      5. Creating an Amazon Glue ETL job to process data for ML
      6. Building a data pipeline using Glue workflows
    5. Summary
  6. Exploring Open-Source ML Libraries
    1. Technical requirements
    2. Core features of open-source ML libraries
    3. Understanding the scikit-learn ML library
      1. Installing scikit-learn
      2. Core components of scikit-learn
    4. Understanding the Apache Spark ML library
      1. Installing Spark ML
      2. Core components of the Spark ML library
    5. Understanding the TensorFlow deep learning library
      1. Installing TensorFlow
      2. Core components of TensorFlow
      3. Hands-on exercise – training a TensorFlow model
    6. Understanding the PyTorch deep learning library
      1. Installing PyTorch
      2. Core components of PyTorch
      3. Hands-on exercise – building and training a PyTorch model
    7. How to choose between TensorFlow and PyTorch
    8. Summary
  7. Kubernetes Container Orchestration Infrastructure Management
    1. Technical requirements
    2. Introduction to containers
    3. Overview of Kubernetes and its core concepts
      1. Namespaces
      2. Pods
      3. Deployment
      4. Kubernetes Job
      5. Kubernetes custom resources and operators
      6. Services
    4. Networking on Kubernetes
    5. Security and access management
      1. API authentication and authorization
    6. Hands-on – creating a Kubernetes infrastructure on AWS
      1. Problem statement
      2. Lab instruction
    7. Summary
  8. Open-Source ML Platforms
    1. Core components of an ML platform
    2. Open-source technologies for building ML platforms
      1. Implementing a data science environment
      2. Building a model training environment
      3. Registering models with a model registry
      4. Serving models using model serving services
        1. The Gunicorn and Flask inference engine
        2. The TensorFlow Serving framework
        3. The TorchServe serving framework
        4. KFServing framework
        5. Seldon Core
        6. Triton Inference Server
      5. Monitoring models in production
      6. Managing ML features
      7. Automating ML pipeline workflows
        1. Apache Airflow
        2. Kubeflow Pipelines
    3. Designing an end-to-end ML platform
      1. ML platform-based strategy
      2. ML component-based strategy
    4. Summary
  9. Building a Data Science Environment Using AWS ML Services
    1. Technical requirements
    2. SageMaker overview
    3. Data science environment architecture using SageMaker
      1. Onboarding SageMaker users
      2. Launching Studio applications
      3. Preparing data
      4. Preparing data interactively with SageMaker Data Wrangler
      5. Preparing data at scale interactively
      6. Processing data as separate jobs
      7. Creating, storing, and sharing features
      8. Training ML models
      9. Tuning ML models
      10. Deploying ML models for testing
    4. Best practices for building a data science environment
    5. Hands-on exercise – building a data science environment using AWS services
      1. Problem statement
      2. Dataset description
      3. Lab instructions
        1. Setting up SageMaker Studio
        2. Launching a JupyterLab notebook
        3. Training the BERT model in the Jupyter notebook
        4. Training the BERT model with the SageMaker Training service
        5. Deploying the model
        6. Building ML models with SageMaker Canvas
    6. Summary
  10. Designing an Enterprise ML Architecture with AWS ML Services
    1. Technical requirements
    2. Key considerations for ML platforms
      1. The personas of ML platforms and their requirements
        1. ML platform builders
        2. Platform users and operators
      2. Common workflow of an ML initiative
      3. Platform requirements for the different personas
    3. Key requirements for an enterprise ML platform
    4. Enterprise ML architecture pattern overview
      1. Model training environment
        1. Model training engine using SageMaker
        2. Automation support
        3. Model training lifecycle management
      2. Model hosting environment
        1. Inference engines
        2. Authentication and security control
        3. Monitoring and logging
    5. Adopting MLOps for ML workflows
      1. Components of the MLOps architecture
      2. Monitoring and logging
        1. Model training monitoring
        2. Model endpoint monitoring
        3. ML pipeline monitoring
        4. Service provisioning management
    6. Best practices in building and operating an ML platform
      1. ML platform project execution best practices
      2. ML platform design and implementation best practices
      3. Platform use and operations best practices
    7. Summary
  11. Advanced ML Engineering
    1. Technical requirements
    2. Training large-scale models with distributed training
      1. Distributed model training using data parallelism
        1. Parameter server overview
        2. AllReduce overview
      2. Distributed model training using model parallelism
        1. Naïve model parallelism overview
        2. Tensor parallelism/tensor slicing overview
        3. Implementing model-parallel training
    3. Achieving low-latency model inference
      1. How model inference works and opportunities for optimization
      2. Hardware acceleration
        1. Central processing units (CPUs)
        2. Graphics processing units (GPUs)
        3. Application-specific integrated circuit
      3. Model optimization
        1. Quantization
        2. Pruning (also known as sparsity)
      4. Graph and operator optimization
        1. Graph optimization
        2. Operator optimization
      5. Model compilers
        1. TensorFlow XLA
        2. PyTorch Glow
        3. Apache TVM
        4. Amazon SageMaker Neo
      6. Inference engine optimization
        1. Inference batching
        2. Enabling parallel serving sessions
        3. Picking a communication protocol
      7. Inference in large language models
        1. Text Generation Inference (TGI)
        2. DeepSpeed-Inference
        3. FastTransformer
    4. Hands-on lab – running distributed model training with PyTorch
      1. Problem statement
      2. Dataset description
      3. Modifying the training script
      4. Modifying and running the launcher notebook
    5. Summary
  12. Building ML Solutions with AWS AI Services
    1. Technical requirements
    2. What are AI services?
    3. Overview of AWS AI services
      1. Amazon Comprehend
      2. Amazon Textract
      3. Amazon Rekognition
      4. Amazon Transcribe
      5. Amazon Personalize
      6. Amazon Lex V2
      7. Amazon Kendra
      8. Amazon Q
      9. Evaluating AWS AI services for ML use cases
    4. Building intelligent solutions with AI services
      1. Automating loan document verification and data extraction
        1. Loan document classification workflow
        2. Loan data processing flow
      2. Media processing and analysis workflow
      3. E-commerce product recommendation
      4. Customer self-service automation with intelligent search
    5. Designing an MLOps architecture for AI services
      1. AWS account setup strategy for AI services and MLOps
      2. Code promotion across environments
      3. Monitoring operational metrics for AI services
    6. Hands-on lab – running ML tasks using AI services
      1. Summary
  13. AI Risk Management
    1. Understanding AI risk scenarios
    2. The regulatory landscape around AI risk management
    3. Understanding AI risk management
      1. Governance oversight principles
      2. AI risk management framework
    4. Applying risk management across the AI lifecycle
      1. Business problem identification and definition
      2. Data acquisition and management
        1. Risk considerations
        2. Risk mitigations
      3. Experimentation and model development
        1. Risk considerations
        2. Risk mitigations
      4. AI system deployment and operations
        1. Risk considerations
        2. Risk mitigations
    5. Designing ML platforms with governance and risk management considerations
      1. Data and model documentation
      2. Lineage and reproducibility
      3. Observability and auditing
      4. Scalability and performance
      5. Data quality
    6. Summary
  14. Bias, Explainability, Privacy, and Adversarial Attacks
    1. Understanding bias
    2. Understanding ML explainability
      1. LIME
      2. SHAP
    3. Understanding security and privacy-preserving ML
      1. Differential privacy
    4. Understanding adversarial attacks
      1. Evasion attacks
        1. PGD attacks
        2. HopSkipJump attacks
      2. Data poisoning attacks
        1. Clean-label backdoor attack
      3. Model extraction attack
      4. Attacks against generative AI models
      5. Defense against adversarial attacks
        1. Robustness-based methods
        2. Detector-based method
      6. Open-source tools for adversarial attacks and defenses
    5. Hands-on lab – detecting bias, explaining models, training privacy-preserving mode, and simulating adversarial attack
      1. Problem statement
      2. Detecting bias in the training dataset
      3. Explaining feature importance for a trained model
      4. Training privacy-preserving models
      5. Simulate a clean-label backdoor attack
    6. Summary
  15. Charting the Course of Your ML Journey
    1. ML adoption stages
      1. Exploring AI/ML
      2. Disjointed AI/ML
      3. Integrated AI/ML
      4. Advanced AI/ML
    2. AI/ML maturity and assessment
      1. Technical maturity
      2. Business maturity
      3. Governance maturity
      4. Organization and talent maturity
      5. Maturity assessment and improvement process
    3. AI/ML operating models
      1. Centralized model
      2. Decentralized model
      3. Hub and spoke model
    4. Solving ML journey challenges
      1. Developing the AI vision and strategy
      2. Getting started with the first AI/ML initiative
      3. Solving scaling challenges with AI/ML adoption
        1. Solving ML use case scaling challenges
        2. Solving technology scaling challenges
        3. Solving governance scaling challenges
    5. Summary
  16. Navigating the Generative AI Project Lifecycle
    1. The advancement and economic impact of generative AI
    2. What industries are doing with generative AI
      1. Financial services
      2. Healthcare and life sciences
      3. Media and entertainment
      4. Automotive and manufacturing
    3. The lifecycle of a generative AI project and the core technologies
      1. Business use case selection
      2. FM selection and evaluation
        1. Initial screening via manual assessment
        2. Automated model evaluation
        3. Human evaluation
        4. Assessing AI risks for FMs
        5. Other evaluation consideration
      3. Building FMs from scratch via pre-training
      4. Adaptation and customization
        1. Domain adaptation pre-training
        2. Fine-tuning
        3. Reinforcement learning from human feedback
        4. Prompt engineering
      5. Model management and deployment
    4. The limitations, risks, and challenges of adopting generative AI
    5. Summary
  17. Designing Generative AI Platforms and Solutions
    1. Operational considerations for generative AI platforms and solutions
      1. New generative AI workflow and processes
      2. New technology components
      3. New roles
      4. Exploring generative AI platforms
        1. The prompt management component
        2. FM benchmark workbench
        3. Supervised fine-tuning and RLHF
        4. FM monitoring
    2. The retrieval-augmented generation pattern
      1. Open-source frameworks for RAG
        1. LangChain
        2. LlamaIndex
      2. Evaluating a RAG pipeline
      3. Advanced RAG patterns
      4. Designing a RAG architecture on AWS
    3. Choosing an LLM adaptation method
      1. Response quality
      2. Cost of the adaptation
      3. Implementation complexity
    4. Bringing it all together
    5. Considerations for deploying generative AI applications in production
      1. Model readiness
      2. Decision-making workflow
      3. Responsible AI assessment
      4. Guardrails in production environments
      5. External knowledge change management
    6. Practical generative AI business solutions
      1. Generative AI-powered semantic search engine
      2. Financial data analysis and research workflow
      3. Clinical trial recruiting workflow
      4. Media entertainment content creation workflow
      5. Car design workflow
      6. Contact center customer service operation
    7. Are we close to having artificial general intelligence?
      1. The symbolic approach
      2. The connectionist/neural network approach
      3. The neural-symbolic approach
    8. Summary
  18. Other Books You May Enjoy
  19. Index

Product information

  • Title: The Machine Learning Solutions Architect Handbook - Second Edition
  • Author(s): David Ping
  • Release date: April 2024
  • Publisher(s): Packt Publishing
  • ISBN: 9781805122500