Book description
Nearly 80 recipes to help you collect and transform data from multiple sources into a single data source, making it way easier to perform analytics on the data
Key Features
- Build data pipelines from scratch and find solutions to common data engineering problems
- Learn how to work with Azure Data Factory, Data Lake, Databricks, and Synapse Analytics
- Monitor and maintain your data engineering pipelines using Log Analytics, Azure Monitor, and Azure Purview
Book Description
The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure.
You'll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You'll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you'll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview.
By the end of this book, you'll be able to build superior data engineering pipelines along with having an invaluable go-to guide.
What you will learn
- Process data using Azure Databricks and Azure Synapse Analytics
- Perform data transformation using Azure Synapse data flows
- Perform common administrative tasks in Azure SQL Database
- Build effective Synapse SQL pools which can be consumed by Power BI
- Monitor Synapse SQL and Spark pools using Log Analytics
- Track data lineage using Microsoft Purview integration with pipelines
Who this book is for
This book is for data engineers, data architects, database administrators, and data professionals who want to get well versed with the Azure data services for building data pipelines. Basic understanding of cloud and data engineering concepts will help in getting the most out of this book.
Table of contents
- Azure Data Engineering Cookbook
- Second Edition
- Contributors
- About the authors
- About the reviewers
- Preface
-
Chapter 1: Creating and Managing Data in Azure Data Lake
- Technical requirements
- Provisioning an Azure storage account using the Azure portal
- Provisioning an Azure storage account using PowerShell
- Creating containers and uploading files to Azure Blob storage using PowerShell
- Managing blobs in Azure Storage using PowerShell
- Configuring blob lifecycle management for blob objects using the Azure portal
-
Chapter 2: Securing and Monitoring Data in Azure Data Lake
- Configuring a firewall for an Azure Data Lake account using the Azure portal
- Configuring virtual networks for an Azure Data Lake account using the Azure portal
- Configuring private links for an Azure Data Lake account
- Configuring encryption using Azure Key Vault for Azure Data Lake
- Accessing Blob storage accounts using managed identities
- Creating an alert to monitor an Azure storage account
- Securing an Azure storage account with SAS using PowerShell
- Chapter 3: Building Data Ingestion Pipelines Using Azure Data Factory
- Chapter 4: Azure Data Factory Integration Runtime
-
Chapter 5: Configuring and Securing Azure SQL Database
- Technical requirements
- Provisioning and connecting to an Azure SQL database using PowerShell
- Implementing an Azure SQL Database elastic pool using PowerShell
- Configuring a virtual network and private endpoints for Azure SQL Database
- Configuring Azure Key Vault for Azure SQL Database
- Provisioning and configuring a wake-up script for a serverless SQL database
- Configuring the Hyperscale tier of Azure SQL Database
-
Chapter 6: Implementing High Availability and Monitoring in Azure SQL Database
- Implementing active geo-replication for an Azure SQL database using PowerShell
- Implementing an auto-failover group for an Azure SQL database using PowerShell
- Configuring high availability to the Hyperscale tier of Azure SQL Database
- Implementing vertical scaling for an Azure SQL database using PowerShell
- Monitoring an Azure SQL database using the Azure portal
- Configuring auditing for Azure SQL Database
-
Chapter 7: Processing Data Using Azure Databricks
- Technical requirements
- Configuring the Azure Databricks environment
- Integrating Databricks with Azure Key Vault
- Mounting an Azure Data Lake container in Databricks
- Processing data using notebooks
- Scheduling notebooks using job clusters
- Working with Delta Lake tables
- Connecting a Databricks Delta Lake table to Power BI
-
Chapter 8: Processing Data Using Azure Synapse Analytics
- Technical requirements
- Provisioning an Azure Synapse Analytics workspace
- Analyzing data using serverless SQL pool
- Provisioning and configuring Spark pools
- Processing data using Spark pools and a lake database
- Querying the data in a lake database from serverless SQL pool
- Scheduling notebooks to process data incrementally
- Visualizing data using Power BI by connecting to serverless SQL pool
-
Chapter 9: Transforming Data Using Azure Synapse Dataflows
- Technical requirements
- Copying data using a Synapse data flow
- Performing data transformation using activities such as join, sort, and filter
- Monitoring data flows and pipelines
- Configuring partitions to optimize data flows
- Parameterizing Synapse data flows
- Handling schema changes dynamically in data flows using schema drift
-
Chapter 10: Building the Serving Layer in Azure Synapse SQL Pool
- Technical requirements
- Loading data into dedicated SQL pools using PolyBase and T-SQL
- Loading data into a dedicated SQL pool using COPY INTO
- Creating distributed tables and modifying table distribution
- Creating statistics and automating the update of statistics
- Creating partitions and archiving data using partitioned tables
- Implementing workload management in an Azure Synapse dedicated SQL pool
- Creating workload groups for advanced workload management
-
Chapter 11: Monitoring Synapse SQL and Spark Pools
- Technical requirements
- Configuring a Log Analytics workspace for Synapse SQL pools
- Configuring a Log Analytics workspace for Synapse Spark pools
- Using Kusto queries to monitor SQL and Spark pools
- Creating workbooks in a Log Analytics workspace to visualize monitoring data
- Monitoring table distribution, data skew, and index health using Synapse DMVs
- Building monitoring dashboards for Synapse with Azure Monitor
-
Chapter 12: Optimizing and Maintaining Synapse SQL and Spark Pools
- Technical requirements
- Analyzing a query plan and fixing table distribution
- Monitoring and rebuilding a replication table cache
- Configuring result set caching in Azure Synapse dedicated SQL pool
- Configuring longer backup retention for a Synapse SQL database
- Auto pausing Synapse dedicated SQL pool
- Optimizing Delta tables in a Synapse Spark pool lake database
- Optimizing query performance in Synapse Spark pools
-
Chapter 13: Monitoring and Maintaining Azure Data Engineering Pipelines
- Technical requirements
- Monitoring Synapse integration pipelines using Log Analytics and workbooks
- Tracing SQL queries for dedicated SQL pool to Synapse integration pipelines
- Provisioning a Microsoft Purview account and creating a data catalog
- Integrating a Synapse workspace with Microsoft Purview and tracking data lineage
- Applying Azure tags using PowerShell to multiple Azure resources
- Index
- Other Books You May Enjoy
Product information
- Title: Azure Data Engineering Cookbook - Second Edition
- Author(s):
- Release date: September 2022
- Publisher(s): Packt Publishing
- ISBN: 9781803246789
You might also like
book
Azure Data Engineering Cookbook
Over 90 recipes to help you orchestrate modern ETL/ELT workflows and perform analytics using Azure services …
book
Azure Data Factory Cookbook - Second Edition
Data Engineers guide to solve real-world problems encountered while building and transforming data pipelines using Azure's …
book
Azure Databricks Cookbook
Get to grips with building and productionizing end-to-end big data solutions in Azure and learn best …
book
Distributed Data Systems with Azure Databricks
Quickly build and deploy massive data pipelines and improve productivity using Azure Databricks Key Features Get …