Azure Data Engineering Cookbook - Second Edition

Book description

Nearly 80 recipes to help you collect and transform data from multiple sources into a single data source, making it way easier to perform analytics on the data

Key Features

  • Build data pipelines from scratch and find solutions to common data engineering problems
  • Learn how to work with Azure Data Factory, Data Lake, Databricks, and Synapse Analytics
  • Monitor and maintain your data engineering pipelines using Log Analytics, Azure Monitor, and Azure Purview

Book Description

The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure.

You'll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You'll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you'll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview.

By the end of this book, you'll be able to build superior data engineering pipelines along with having an invaluable go-to guide.

What you will learn

  • Process data using Azure Databricks and Azure Synapse Analytics
  • Perform data transformation using Azure Synapse data flows
  • Perform common administrative tasks in Azure SQL Database
  • Build effective Synapse SQL pools which can be consumed by Power BI
  • Monitor Synapse SQL and Spark pools using Log Analytics
  • Track data lineage using Microsoft Purview integration with pipelines

Who this book is for

This book is for data engineers, data architects, database administrators, and data professionals who want to get well versed with the Azure data services for building data pipelines. Basic understanding of cloud and data engineering concepts will help in getting the most out of this book.

Table of contents

  1. Azure Data Engineering Cookbook
  2. Second Edition
  3. Contributors
  4. About the authors
  5. About the reviewers
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
    4. Download the color images
    5. Conventions used
    6. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There’s more…
      5. See also
    7. Get in touch
    8. Share your thoughts
  7. Chapter 1: Creating and Managing Data in Azure Data Lake
    1. Technical requirements
    2. Provisioning an Azure storage account using the Azure portal
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Provisioning an Azure storage account using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Creating containers and uploading files to Azure Blob storage using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Managing blobs in Azure Storage using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Configuring blob lifecycle management for blob objects using the Azure portal
      1. Getting ready
      2. How to do it…
      3. How it works…
  8. Chapter 2: Securing and Monitoring Data in Azure Data Lake
    1. Configuring a firewall for an Azure Data Lake account using the Azure portal
      1. Getting ready
      2. How to do it…
      3. How it works…
    2. Configuring virtual networks for an Azure Data Lake account using the Azure portal
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Configuring private links for an Azure Data Lake account
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Configuring encryption using Azure Key Vault for Azure Data Lake
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Accessing Blob storage accounts using managed identities
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Creating an alert to monitor an Azure storage account
      1. Getting ready
      2. How to do it…
      3. How it works…
    7. Securing an Azure storage account with SAS using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
  9. Chapter 3: Building Data Ingestion Pipelines Using Azure Data Factory
    1. Technical requirements
    2. Provisioning Azure Data Factory
      1. How to do it…
      2. How it works…
    3. Copying files to a database from a data lake using a control flow and copy activity
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Triggering a pipeline in Azure Data Factory
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Copying data from a SQL Server virtual machine to a data lake using the Copy data wizard
      1. Getting ready
      2. How to do it…
      3. How it works…
  10. Chapter 4: Azure Data Factory Integration Runtime
    1. Technical requirements
    2. Configuring a self-hosted IR
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Configuring a shared self-hosted IR
      1. Getting ready
      2. How to do it…
    4. Configuring high availability for a self-hosted IR
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Patching a self-hosted IR
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Migrating an SSIS package to Azure Data Factory
      1. Getting ready
      2. How to do it…
      3. How it works...
  11. Chapter 5: Configuring and Securing Azure SQL Database
    1. Technical requirements
    2. Provisioning and connecting to an Azure SQL database using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Implementing an Azure SQL Database elastic pool using PowerShell
      1. Getting ready
      2. How to do it...
      3. How it works…
    4. Configuring a virtual network and private endpoints for Azure SQL Database
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Configuring Azure Key Vault for Azure SQL Database
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Provisioning and configuring a wake-up script for a serverless SQL database
      1. Getting ready
      2. How to do it…
      3. How it works…
    7. Configuring the Hyperscale tier of Azure SQL Database
      1. Getting ready
      2. How to do it…
  12. Chapter 6: Implementing High Availability and Monitoring in Azure SQL Database
    1. Implementing active geo-replication for an Azure SQL database using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    2. Implementing an auto-failover group for an Azure SQL database using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Configuring high availability to the Hyperscale tier of Azure SQL Database
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Implementing vertical scaling for an Azure SQL database using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Monitoring an Azure SQL database using the Azure portal
      1. Getting ready
      2. How to do it…
    6. Configuring auditing for Azure SQL Database
      1. Getting ready
      2. How to do it…
      3. How it works…
  13. Chapter 7: Processing Data Using Azure Databricks
    1. Technical requirements
    2. Configuring the Azure Databricks environment
      1. Getting ready
      2. How to do it…
    3. Integrating Databricks with Azure Key Vault
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Mounting an Azure Data Lake container in Databricks
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Processing data using notebooks
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Scheduling notebooks using job clusters
      1. Getting ready
      2. How to do it…
      3. How it works…
    7. Working with Delta Lake tables
      1. Getting ready
      2. How to do it…
      3. How it works…
    8. Connecting a Databricks Delta Lake table to Power BI
      1. Getting ready
      2. How to do it…
      3. How it works…
  14. Chapter 8: Processing Data Using Azure Synapse Analytics
    1. Technical requirements
    2. Provisioning an Azure Synapse Analytics workspace
      1. Getting ready
      2. How to do it…
    3. Analyzing data using serverless SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Provisioning and configuring Spark pools
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Processing data using Spark pools and a lake database
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Querying the data in a lake database from serverless SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    7. Scheduling notebooks to process data incrementally
      1. Getting ready
      2. How to do it…
      3. How it works…
    8. Visualizing data using Power BI by connecting to serverless SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
  15. Chapter 9: Transforming Data Using Azure Synapse Dataflows
    1. Technical requirements
    2. Copying data using a Synapse data flow
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Performing data transformation using activities such as join, sort, and filter
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Monitoring data flows and pipelines
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Configuring partitions to optimize data flows
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Parameterizing Synapse data flows
      1. Getting ready
      2. How to do it…
      3. How it works…
    7. Handling schema changes dynamically in data flows using schema drift
      1. Getting ready
      2. How to do it…
      3. How it works…
  16. Chapter 10: Building the Serving Layer in Azure Synapse SQL Pool
    1. Technical requirements
    2. Loading data into dedicated SQL pools using PolyBase and T-SQL
      1. Getting ready
      2. How to do it…
      3. How it works...
    3. Loading data into a dedicated SQL pool using COPY INTO
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Creating distributed tables and modifying table distribution
      1. Getting ready
      2. How to do it…
      3. How it works...
    5. Creating statistics and automating the update of statistics
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Creating partitions and archiving data using partitioned tables
      1. Getting ready
      2. How to do it…
      3. How it works...
    7. Implementing workload management in an Azure Synapse dedicated SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    8. Creating workload groups for advanced workload management
      1. Getting ready
      2. How to do it…
      3. How it works...
  17. Chapter 11: Monitoring Synapse SQL and Spark Pools
    1. Technical requirements
    2. Configuring a Log Analytics workspace for Synapse SQL pools
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Configuring a Log Analytics workspace for Synapse Spark pools
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Using Kusto queries to monitor SQL and Spark pools
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Creating workbooks in a Log Analytics workspace to visualize monitoring data
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Monitoring table distribution, data skew, and index health using Synapse DMVs
      1. Getting ready
      2. How to do it…
    7. Building monitoring dashboards for Synapse with Azure Monitor
      1. Getting ready
      2. How to do it…
      3. How it works…
  18. Chapter 12: Optimizing and Maintaining Synapse SQL and Spark Pools
    1. Technical requirements
    2. Analyzing a query plan and fixing table distribution
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Monitoring and rebuilding a replication table cache
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Configuring result set caching in Azure Synapse dedicated SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Configuring longer backup retention for a Synapse SQL database
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Auto pausing Synapse dedicated SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    7. Optimizing Delta tables in a Synapse Spark pool lake database
      1. Getting ready
      2. How to do it…
      3. How it works…
    8. Optimizing query performance in Synapse Spark pools
      1. Getting ready
      2. How to do it…
      3. How it works…
  19. Chapter 13: Monitoring and Maintaining Azure Data Engineering Pipelines
    1. Technical requirements
    2. Monitoring Synapse integration pipelines using Log Analytics and workbooks
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Tracing SQL queries for dedicated SQL pool to Synapse integration pipelines
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Provisioning a Microsoft Purview account and creating a data catalog
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Integrating a Synapse workspace with Microsoft Purview and tracking data lineage
      1. Getting ready
      2. How to do it…
      3. How it works…
    6. Applying Azure tags using PowerShell to multiple Azure resources
      1. Getting ready
      2. How to do it…
      3. How it works…
  20. Index
    1. Why subscribe?
  21. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share your thoughts

Product information

  • Title: Azure Data Engineering Cookbook - Second Edition
  • Author(s): Nagaraj Venkatesan, Ahmad Osama
  • Release date: September 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803246789