Data Analytics in the AWS Cloud

Book description

A comprehensive and accessible roadmap to performing data analytics in the AWS cloud

In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you’ll explore every relevant aspect of data analytics—from data engineering to analysis, business intelligence, DevOps, and MLOps—as you discover how to integrate machine learning predictions with analytics engines and visualization tools.

You’ll also find:

  • Real-world use cases of AWS architectures that demystify the applications of data analytics
  • Accessible introductions to data acquisition, importation, storage, visualization, and reporting
  • Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance

A can't-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.

Table of contents

  1. Cover
  2. Title Page
  3. Introduction
    1. What Is a Data Lake?
    2. The Data Platform
    3. The End of the Beginning
    4. Note
  4. Chapter 1: AWS Data Lakes and Analytics Technology Overview
    1. Why AWS?
    2. What Does a Data Lake Look Like in AWS?
    3. Analytics on AWS
    4. Skills Required to Build and Maintain an AWS Analytics Pipeline
  5. Chapter 2: The Path to Analytics: Setting Up a Data and Analytics Team
    1. The Data Vision
    2. DA Team Roles
    3. Analytics Flow at a Process Level
    4. The DA Team Mantra: “Automate Everything”
    5. Analytics Models in the Wild: Centralized, Distributed, Center of Excellence
    6. Summary
  6. Chapter 3: Working on AWS
    1. Accessing AWS
    2. Everything Is a Resource
    3. IAM: Policies, Roles, and Users
    4. Working with the Web Console
    5. The AWS Command‐Line Interface
    6. Infrastructure‐as‐Code: CloudFormation and Terraform
  7. Chapter 4: Serverless Computing and Data Engineering
    1. Serverless vs. Fully Managed
    2. AWS Serverless Technologies
    3. AWS Serverless Application Model (SAM)
    4. Summary
  8. Chapter 5: Data Ingestion
    1. AWS Data Lake Architecture
    2. Sample Processing Architecture: Cataloging Images into DynamoDB
    3. Serverless Ingestion
    4. Fully Managed Ingestion with AppFlow
    5. Operational Data Ingestion with Database Migration Service
    6. Summary
  9. Chapter 6: Processing Data
    1. Phases of Data Preparation
    2. Overview of ETL in AWS
    3. ETL Job Design Concepts
    4. AWS Glue for ETL
    5. Connectors
    6. Creating ETL Jobs with AWS Glue Visual Editor
    7. Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target)
    8. Creating ETL Jobs with the Spark Script Editor
    9. Developing ETL Jobs with AWS Glue Notebooks
    10. Creating ETL Jobs with AWS Glue Interactive Sessions
    11. Streaming Jobs
  10. Chapter 7: Cataloging, Governance, and Search
    1. Cataloging with AWS Glue
    2. Search with Amazon Athena: The Heart of Analytics in AWS
    3. Governing: Athena Workgroups, Lake Formation, and More
    4. AWS Lake Formation
    5. Summary
  11. Chapter 8: Data Consumption: BI, Visualization, and Reporting
    1. QuickSight
    2. Data Consumption: Not Only Dashboards
    3. Summary
  12. Chapter 9: Machine Learning at Scale
    1. Machine Learning and Artificial Intelligence
    2. Amazon SageMaker
    3. Summary
  13. Appendix: Example Data Architectures in AWS
    1. Modern Data Lake Architecture
    2. Batch Processing
    3. Stream Processing
    4. Architecture Design Recommendations
    5. Summary
  14. Index
  15. Copyright
  16. About the Author
  17. About the Technical Editor
  18. Acknowledgments
  19. End User License Agreement

Product information

  • Title: Data Analytics in the AWS Cloud
  • Author(s): Joe Minichino
  • Release date: May 2023
  • Publisher(s): Sybex
  • ISBN: 9781119909248