Skip to content
  • Sign In
  • Try Now
View all events
ETL

Databricks Data Engineer Associate Certification Prep in 2 Weeks

Published by O'Reilly Media, Inc.

Beginner to intermediate content levelBeginner to intermediate

Course outcomes

  • Understand how to use Databricks Lakehouse Platform and its tools
  • Learn how to build ETL pipelines and process data incrementally
  • Discover how to put data pipelines and dashboards into production
  • Understand and follow best security practices in Databricks

Course description

Databricks Lakehouse is a modern data platform that combines the best aspects of data lakes and data warehouses. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the platform and its capabilities and the right skills to complete essential data engineering tasks on the platform.

Join expert Derar Alhussein to build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse Platform and its tools and benefits. You’ll learn to build ETL pipelines using Apache Spark SQL and Python in both batch and streaming modes, and you’ll discover how to orchestrate production pipelines and design dashboards while maintaining entity permissions.

NOTE: With today’s registration, you’ll be signed up for all four sessions. Although you can attend any of the sessions individually, we recommend participating in all.

What you’ll learn and how you can apply it

  • Use the Databricks Lakehouse Platform and its tools
  • Build ETL pipelines using Apache Spark SQL and Python
  • Incrementally process data using Apache Spark Structured Streaming, Auto Loader, and multihop architecture
  • Build production pipelines and dashboards using Delta Live Tables, Jobs, and Databricks SQL
  • Manage security permissions in Databricks, including data objects privileges, and Unity Catalog

This live event is for you because...

  • You want to become a Databricks Certified Data Engineer Associate.
  • You’re new to Databricks and want to save time by learning Databricks fundamentals.
  • You’re a data engineer who wants to apply your skills to Databricks.

Prerequisites

  • Have or create a cloud account on Azure, AWS, or GCP (without an account, you’ll use a limited Community Edition of Databricks)
  • Basic SQL knowledge
  • Python programming experience
  • Familiarity with cloud fundamentals

Recommended preparation:

  • Bookmark the course GitHub repository (instructions for cloning the repo in your Databricks workspace will be given in the course)

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Day 1: Databricks Lakehouse Platform

Introduction to Databricks (20 minutes)

  • Presentation: Course overview; What is Databricks Lakehouse?
  • Hands-on exercise: Answer knowledge-check question
  • Q&A

Setting up Databricks workspace (60 minutes)

  • Group discussion: What is your cloud provider?
  • Presentation: Getting started with Databricks Community Edition; creating free trials on Azure, AWS, and GCP
  • Hands-on exercise: Create your Databricks workspace
  • Q&A
  • Break

Exploring Databricks workspace (20 minutes)

  • Presentation: Navigating workspace; importing course materials
  • Hands-on exercise: Import course materials from GitHub into your workspace
  • Q&A

Working with notebooks (45 minutes)

  • Presentation: Creating clusters; notebooks fundamentals
  • Hands-on exercises: Create a cluster; run a notebook
  • Q&A
  • Break

Databricks Repos (15 minutes)

  • Presentation: Configure GIT integration in Databricks workspace; create branches; push and pull changes
  • Hands-on exercise: Answer knowledge-check question
  • Q&A

Delta Lake (50 minutes)

  • Presentation: Delta Lake; working with Delta Lake tables
  • Hands-on exercise: Create Delta Lake tables
  • Q&A
  • Break

Advanced Delta Lake features (30 minutes)

  • Presentation: Time travel; compacting small files; indexing; vacuum
  • Hands-on exercise: Answer knowledge-check question
  • Q&A

Day 2: ETL with Spark SQL and Python

Relational entities on Databricks (80 minutes)

  • Presentation: Relational entities; working with databases and tables on Databricks; setting up tables; working with views
  • Hands-on exercises: Create and query relational entities
  • Q&A
  • Break

Processing data files (80 minutes)

  • Presentation: Querying data files; writing data files to tables
  • Hands-on exercise: Process data files with Spark SQL
  • Q&A
  • Break

Advanced ETL (80 minutes)

  • Presentation: Advanced transformations; higher order functions; SQL UDFs
  • Hands-on exercises: Apply advanced transformations; answer knowledge-check question
  • Q&A

Day 3: Incremental Data Processing

Spark Structured Streaming (80 minutes)

  • Presentation: Structured streaming; incremental data ingestion; Auto Loader
  • Hands-on exercises: Process data incrementally with Spark Structured Streaming; answer knowledge-check question
  • Q&A
  • Break

Multihop architecture (80 minutes)

  • Presentation: Building a multihop architecture
  • Hands-on exercises: Build a multihop architecture; answer knowledge-check question
  • Q&A
  • Break

Delta Live Tables (50 minutes)

  • Presentation: Delta Live Tables
  • Hands-on exercises: Create and run a DLT pipeline; answer knowledge-check question
  • Q&A

Change data capture (30 minutes)

  • Presentation: Change data capture; processing CDC feed with Delta Live Tables
  • Hands-on exercise: Answer knowledge-check question
  • Q&A

Day 4: Production Pipelines and Data Governance

Databricks Jobs (80 minutes)

  • Presentation: Task orchestration with Databricks Jobs
  • Hands-on exercises: Create and run a Databricks job; answer knowledge-check question
  • Q&A
  • Break

Databricks SQL (80 minutes)

  • Presentation: Running DBSQL queries; designing dashboards
  • Hands-on exercises: Design a dashboard with DBSQL; answer knowledge-check question
  • Q&A
  • Break

Data governance (60 minutes)

  • Presentation: Data objects privileges; managing permissions; Unity Catalog
  • Hands-on exercises: Apply data objects privileges; answer knowledge-check question
  • Q&A
  • Break

Certification overview (20 minutes)

  • Presentation: Certification overview
  • Q&A

Your Instructor

  • Derar Alhussein

    Derar Alhussein is a senior data engineer with a master's degree in data mining. He is the author of the O’Reilly book Databricks Certified Data Engineer Associate Study Guide. He has over a decade of hands-on experience in software and data projects, and currently holds eight certifications from Databricks, showcasing his proficiency in the field.

    Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.

    In 2024, Databricks recognized Derar as a Databricks Beacon, acknowledging his outstanding technical skills and contributions to the data and AI community.