Azure Data Factory Cookbook

Book description

Solve real-world data problems and create data-driven workflows for easy data movement and processing at scale with Azure Data Factory

Key Features

  • Learn how to load and transform data from various sources, both on-premises and on cloud
  • Use Azure Data Factory's visual environment to build and manage hybrid ETL pipelines
  • Discover how to prepare, transform, process, and enrich data to generate key insights

Book Description

Azure Data Factory (ADF) is a modern data integration tool available on Microsoft Azure. This Azure Data Factory Cookbook helps you get up and running by showing you how to create and execute your first job in ADF. You'll learn how to branch and chain activities, create custom activities, and schedule pipelines. This book will help you to discover the benefits of cloud data warehousing, Azure Synapse Analytics, and Azure Data Lake Gen2 Storage, which are frequently used for big data analytics. With practical recipes, you'll learn how to actively engage with analytical tools from Azure Data Services and leverage your on-premise infrastructure with cloud-native tools to get relevant business insights. As you advance, you'll be able to integrate the most commonly used Azure Services into ADF and understand how Azure services can be useful in designing ETL pipelines. The book will take you through the common errors that you may encounter while working with ADF and show you how to use the Azure portal to monitor pipelines. You'll also understand error messages and resolve problems in connectors and data flows with the debugging capabilities of ADF.

By the end of this book, you'll be able to use ADF as the main ETL and orchestration tool for your data warehouse or data platform projects.

What you will learn

  • Create an orchestration and transformation job in ADF
  • Develop, execute, and monitor data flows using Azure Synapse
  • Create big data pipelines using Azure Data Lake and ADF
  • Build a machine learning app with Apache Spark and ADF
  • Migrate on-premises SSIS jobs to ADF
  • Integrate ADF with commonly used Azure services such as Azure ML, Azure Logic Apps, and Azure Functions
  • Run big data compute jobs within HDInsight and Azure Databricks
  • Copy data from AWS S3 and Google Cloud Storage to Azure Storage using ADF's built-in connectors

Who this book is for

This book is for ETL developers, data warehouse and ETL architects, software professionals, and anyone who wants to learn about the common and not-so-common challenges faced while developing traditional and hybrid ETL solutions using Microsoft's Azure Data Factory. You'll also find this book useful if you are looking for recipes to improve or enhance your existing ETL pipelines. Basic knowledge of data warehousing is expected.

Table of contents

  1. Azure Data Factory Cookbook
  2. Why subscribe?
  3. Contributors
  4. About the authors
  5. About the reviewers
  6. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
    4. Download the color images
    5. Conventions used
    6. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    7. Get in touch
    8. Reviews
  8. Chapter 1: Getting Started with ADF
    1. Introduction to the Azure data platform
      1. Getting ready
      2. How to do it...
      3. How it works...
    2. Creating and executing our first job in ADF
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Creating an ADF pipeline by using the Copy Data tool
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Creating an ADF pipeline using Python
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Creating a data factory using PowerShell
      1. Getting ready
      2. How to do it…
      3. How it works...
      4. There's more...
      5. See also
    6. Using templates to create ADF pipelines
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  9. Chapter 2: Orchestration and Control Flow
    1. Technical requirements
    2. Using parameters and built-in functions
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    3. Using Metadata and Stored Procedure activities
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    4. Using the ForEach and Filter activities
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Chaining and branching activities within a pipeline
      1. Getting ready
      2. How to do it…
      3. There's more…
    6. Using the Lookup, Web, and Execute Pipeline activities
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    7. Creating event-based triggers
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
  10. Chapter 3: Setting Up a Cloud Data Warehouse
    1. Technical requirements
    2. Connecting to Azure Synapse Analytics
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    3. Loading data to Azure Synapse Analytics using SSMS
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    4. Loading data to Azure Synapse Analytics using Azure Data Factory
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    5. Pausing/resuming an Azure SQL pool from Azure Data Factory
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    6. Creating an Azure Synapse workspace
      1. Getting ready
      2. How to do it…
      3. There's more…
    7. Loading data to Azure Synapse Analytics using bulk load
      1. Getting ready
      2. How to do it…
      3. How it works…
    8. Copying data in Azure Synapse Orchestrate
      1. Getting ready
      2. How to do it…
      3. How it works…
    9. Using SQL on-demand
      1. Getting ready
      2. How to do it…
      3. How it works…
  11. Chapter 4: Working with Azure Data Lake
    1. Technical requirements
    2. Setting up Azure Data Lake Storage Gen2
      1. Getting ready
      2. How to do it...
    3. Connecting Azure Data Lake to Azure Data Factory and loading data
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Creating big data pipelines using Azure Data Lake and Azure Data Factory
      1. Getting ready
      2. How to do it...
      3. How it works
  12. Chapter 5: Working with Big Data – HDInsight and Databricks
    1. Technical requirements
    2. Setting up an HDInsight cluster
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Processing data from Azure Data Lake with HDInsight and Hive
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Processing big data with Apache Spark
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Building a machine learning app with Databricks and Azure Data Lake Storage
      1. Getting ready
      2. How to do it…
      3. How it works…
  13. Chapter 6: Integration with MS SSIS
    1. Technical requirements
    2. Creating a SQL Server database
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Building an SSIS package
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Running SSIS packages from ADF
      1. Getting ready
      2. How to do it…
      3. How it works…
  14. Chapter 7: Data Migration – Azure Data Factory and Other Cloud Services
    1. Technical requirements
    2. Copying data from Amazon S3 to Azure Blob storage
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Copying large datasets from S3 to ADLS
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. See also
    4. Copying data from Google Cloud Storage to Azure Data Lake
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. See also
    5. Copying data from Google BigQuery to Azure Data Lake Store
      1. Getting ready
      2. How to do it…
    6. Migrating data from Google BigQuery to Azure Synapse
      1. Getting ready
      2. How to do it…
      3. See also
    7. Moving data to Dropbox
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
  15. Chapter 8: Working with Azure Services Integration
    1. Technical requirements
    2. Triggering your data processing with Logic Apps
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    3. Using the web activity to call an Azure logic app
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    4. Adding flexibility to your pipelines with Azure Functions
      1. Getting ready…
      2. How to do it…
      3. How it works…
      4. There's more…
    5. Automatically building ML models with speed and scale
      1. Getting ready
      2. How to do it...
      3. How it works…
      4. There's more...
    6. Transforming and preparing your data via Azure Databricks
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
  16. Chapter 9: Managing Deployment Processes with Azure DevOps
    1. Technical requirements
    2. Setting up Azure DevOps
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Publishing changes to ADF
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Deploying your features into the master branch
      1. Getting ready
      2. How to do it...
      3. How it works...
    5. Getting ready for the CI/CD of ADF
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Creating an Azure pipeline for CD
      1. Getting ready
      2. How to do it...
      3. How to do it...
      4. There's more...
  17. Chapter 10: Monitoring and Troubleshooting Data Pipelines
    1. Technical requirements
    2. Monitoring pipeline runs and integration runtimes
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Investigating failures – running in debug mode
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    4. Rerunning activities
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Configuring alerts for your Data Factory runs
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
  18. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Azure Data Factory Cookbook
  • Author(s): Dmitry Anoshin, Dmitry Foshin, Roman Storchak, Xenia Ireton
  • Release date: December 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781800565296