Azure Data Factory for Beginners - Build Data Ingestion

Video description

Building frameworks is now an industry norm and it has become an important skill to know how to visualize, design, plan, and implement data frameworks. The framework that we are going to build together is the Metadata-Driven Ingestion Framework. Metadata-driven frameworks allow a company to develop the system just once and it can be adopted and reused by various business clusters without the need for additional development, thus saving the business time and costs. Think of it as a plug-and-play system.

The first objective of the course is to onboard you onto the Azure Data Factory platform to help you assemble your first Azure Data Factory pipeline. Once you get a good grip on the Azure Data Factory development pattern, then it becomes easier to adopt the same pattern to onboard other sources and data sinks.

Once you are comfortable with building a basic Azure Data Factory pipeline, as a second objective, we then move on to building a fully-fledged and working metadata-driven framework to make the ingestion more dynamic; furthermore, we will build the framework in such a way that you can audit every batch orchestration and individual pipeline runs for business intelligence and operational monitoring.

By the end of this course, you will be able to design, implement, and get production-ready for data ingestion in Azure.

What You Will Learn

  • Learn about Azure Data Factory and Azure Blob Storage
  • Understand data engineering, data lake, and metadata-driven frameworks concepts
  • Look at the industry-based example of how to build ingestion frameworks
  • Learn dynamic Azure Data Factory pipelines and email notifications with logic apps
  • Study tracking of pipelines and batch runs
  • Look at version management with Azure DevOps

Audience

This course is ideal for aspiring data engineers and developers that are curious about Azure Data Factory as an ETL alternative.

You will need a basic PC/laptop; no prior knowledge of Microsoft Azure is required.

About The Author

David Mngadi: David Mngadi is a data management professional who is influenced by the power of data in our lives and has helped several companies become more data-driven to gain a competitive edge as well as meet the regulatory requirements. In the last 15 years, he has had the pleasure of designing and implementing data warehousing solutions in retail, telco, and banking industries, and recently in more big data lake-specific implementations. He is passionate about technology and teaching programming online.

Table of contents

  1. Chapter 1 : Introduction – Build Your First Azure Data Pipeline
    1. Introduction to the Course
    2. Introduction to ADF (Azure Data Factory)
    3. Requirements Discussion and Technical Architecture
    4. Register a Free Azure Account
    5. Create a Data Factory Resource
    6. Create a Storage Account and Upload Data
    7. Create Data Lake Gen 2 Storage Account
    8. Download Storage Explorer
    9. Create Your First Azure Pipeline
    10. Closing Remarks
  2. Chapter 2 : Metadata-Driven Ingestion
    1. Introduction to Metadata-Driven Ingestion
    2. High-Level Plan
    3. Create Active Directory User
    4. Assign the Contributor Role to the User
    5. Disable Security Defaults
    6. Creating the Metadata Database
    7. Install Azure Data Studio
    8. Create Metadata Tables and Stored Procedures
    9. Reconfigure Existing Data Factory Artifacts
    10. Set Up Logic App to Handle Email Notifications
    11. Modify the Data Factory Pipeline to Send an Email Notification
    12. Create Linked Service for Metadata Database and Email Dataset
    13. Create Utility Pipeline to Send Email Notifications
    14. Explaining the Email Recipients Table
    15. Explaining the Get Email Addresses Stored Procedure
    16. Modify Ingestion Pipeline to Use the Email Utility Pipeline
    17. Tracking the Triggered Pipeline
    18. Making the Email Notifications Dynamic
    19. Making Logging of Pipeline Information Dynamic
    20. Add a New Way to Log the Main Ingestion Pipeline
    21. Change the Logging of Pipelines to Send Fail Message Only
    22. Creating Dynamic Datasets
    23. Reading from Source to Target - Part 1
    24. Reading from Source to Target - Part 2
    25. Explaining the Source to Target Stored Procedure
    26. Add Orchestration Pipeline - Part 1
    27. Add Orchestration Pipeline - Part 2
    28. Fixing the Duplicating Batch Ingestions
    29. Understanding the Pipeline Log and Related Tables
    30. Understanding the GetBatch Stored Procedure
    31. Understanding the Set Batch Status and GetRunID
    32. Setting Up an Azure DevOps Git Repository
    33. Publishing the Data Factory to Azure DevOps
    34. Closing Remarks
  3. Chapter 3 : Event-Driven Ingestion
    1. Introduction
    2. Read from Azure Storage Plan
    3. Create Finance Container and Upload Files
    4. Create Source Dataset
    5. Write to Data Lake - Raw Plan
    6. Create Finance Container and Directories
    7. Create Sink Dataset
    8. Data Factory Pipeline Plan
    9. Create Data Factory and Read Metadata
    10. Add Filter by CSV
    11. Add Dataset to Read Files
    12. Add the For Each CSV File Activity and Test Ingestion
    13. Adding the Event-Based Trigger Plan
    14. Enable the Event Grid Provider
    15. Delete File and Add Event-Based Trigger
    16. Create Event-Based Trigger
    17. Publish Code to Main Branch and Start Trigger
    18. Trigger Event-Based Ingestion
    19. Closing Remarks

Product information

  • Title: Azure Data Factory for Beginners - Build Data Ingestion
  • Author(s): David Mngadi
  • Release date: June 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781804610329