Databricks Data Engineer Associate Certification Prep in 2 Weeks
Published by O'Reilly Media, Inc.
Course outcomes
- Understand how to use Databricks Lakehouse Platform and its tools
- Learn how to build ETL pipelines and process data incrementally
- Discover how to put data pipelines and dashboards into production
- Understand and follow best security practices in Databricks
Course description
Databricks Lakehouse is a modern data platform that combines the best aspects of data lakes and data warehouses. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the platform and its capabilities and the right skills to complete essential data engineering tasks on the platform.
Join expert Derar Alhussein to build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse Platform and its tools and benefits. You’ll learn to build ETL pipelines using Apache Spark SQL and Python in both batch and streaming modes, and you’ll discover how to orchestrate production pipelines and design dashboards while maintaining entity permissions.
NOTE: With today’s registration, you’ll be signed up for all four sessions. Although you can attend any of the sessions individually, we recommend participating in all.
What you’ll learn and how you can apply it
- Use the Databricks Lakehouse Platform and its tools
- Build ETL pipelines using Apache Spark SQL and Python
- Incrementally process data using Apache Spark Structured Streaming, Auto Loader, and multihop architecture
- Build production pipelines and dashboards using Delta Live Tables, Jobs, and Databricks SQL
- Manage security permissions in Databricks, including data objects privileges, and Unity Catalog
This live event is for you because...
- You want to become a Databricks Certified Data Engineer Associate.
- You’re new to Databricks and want to save time by learning Databricks fundamentals.
- You’re a data engineer who wants to apply your skills to Databricks.
Prerequisites
- Have or create a cloud account on Azure, AWS, or GCP (without an account, you’ll use a limited Community Edition of Databricks)
- Basic SQL knowledge
- Python programming experience
- Familiarity with cloud fundamentals
Recommended preparation:
- Bookmark the course GitHub repository (instructions for cloning the repo in your Databricks workspace will be given in the course)
Recommended follow-up:
- Read Databricks Certified Data Engineer Associate Study Guide (book)
- Watch Getting Started with Databricks (video)
- Read The Azure Data Lakehouse Toolkit (book)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Day 1: Databricks Lakehouse Platform
Introduction to Databricks (20 minutes)
- Presentation: Course overview; What is Databricks Lakehouse?
- Hands-on exercise: Answer knowledge-check question
- Q&A
Setting up Databricks workspace (60 minutes)
- Group discussion: What is your cloud provider?
- Presentation: Getting started with Databricks Community Edition; creating free trials on Azure, AWS, and GCP
- Hands-on exercise: Create your Databricks workspace
- Q&A
- Break
Exploring Databricks workspace (20 minutes)
- Presentation: Navigating workspace; importing course materials
- Hands-on exercise: Import course materials from GitHub into your workspace
- Q&A
Working with notebooks (45 minutes)
- Presentation: Creating clusters; notebooks fundamentals
- Hands-on exercises: Create a cluster; run a notebook
- Q&A
- Break
Databricks Repos (15 minutes)
- Presentation: Configure GIT integration in Databricks workspace; create branches; push and pull changes
- Hands-on exercise: Answer knowledge-check question
- Q&A
Delta Lake (50 minutes)
- Presentation: Delta Lake; working with Delta Lake tables
- Hands-on exercise: Create Delta Lake tables
- Q&A
- Break
Advanced Delta Lake features (30 minutes)
- Presentation: Time travel; compacting small files; indexing; vacuum
- Hands-on exercise: Answer knowledge-check question
- Q&A
Day 2: ETL with Spark SQL and Python
Relational entities on Databricks (80 minutes)
- Presentation: Relational entities; working with databases and tables on Databricks; setting up tables; working with views
- Hands-on exercises: Create and query relational entities
- Q&A
- Break
Processing data files (80 minutes)
- Presentation: Querying data files; writing data files to tables
- Hands-on exercise: Process data files with Spark SQL
- Q&A
- Break
Advanced ETL (80 minutes)
- Presentation: Advanced transformations; higher order functions; SQL UDFs
- Hands-on exercises: Apply advanced transformations; answer knowledge-check question
- Q&A
Day 3: Incremental Data Processing
Spark Structured Streaming (80 minutes)
- Presentation: Structured streaming; incremental data ingestion; Auto Loader
- Hands-on exercises: Process data incrementally with Spark Structured Streaming; answer knowledge-check question
- Q&A
- Break
Multihop architecture (80 minutes)
- Presentation: Building a multihop architecture
- Hands-on exercises: Build a multihop architecture; answer knowledge-check question
- Q&A
- Break
Delta Live Tables (50 minutes)
- Presentation: Delta Live Tables
- Hands-on exercises: Create and run a DLT pipeline; answer knowledge-check question
- Q&A
Change data capture (30 minutes)
- Presentation: Change data capture; processing CDC feed with Delta Live Tables
- Hands-on exercise: Answer knowledge-check question
- Q&A
Day 4: Production Pipelines and Data Governance
Databricks Jobs (80 minutes)
- Presentation: Task orchestration with Databricks Jobs
- Hands-on exercises: Create and run a Databricks job; answer knowledge-check question
- Q&A
- Break
Databricks SQL (80 minutes)
- Presentation: Running DBSQL queries; designing dashboards
- Hands-on exercises: Design a dashboard with DBSQL; answer knowledge-check question
- Q&A
- Break
Data governance (60 minutes)
- Presentation: Data objects privileges; managing permissions; Unity Catalog
- Hands-on exercises: Apply data objects privileges; answer knowledge-check question
- Q&A
- Break
Certification overview (20 minutes)
- Presentation: Certification overview
- Q&A
Your Instructor
Derar Alhussein
Derar Alhussein is a senior data engineer with a master's degree in data mining. He is the author of the O’Reilly book Databricks Certified Data Engineer Associate Study Guide. He has over a decade of hands-on experience in software and data projects, and currently holds eight certifications from Databricks, showcasing his proficiency in the field.
Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.
In 2024, Databricks recognized Derar as a Databricks Beacon, acknowledging his outstanding technical skills and contributions to the data and AI community.