Skip to content
  • Sign In
  • Try Now
View all events
Data Engineering

Data Engineering Fundamentals in 2 Weeks

Published by O'Reilly Media, Inc.

Beginner content levelBeginner

Managing the Lifecycle of Data Projects

Course Outcomes

  • Discern and organize roles of data engineering, and how those roles impact others in an organization
  • Manage the data engineering life cycle and use it to guide decisions on what technologies to use
  • Use the right tool to solve the right type of problem, and offer practical solutions to an organization’s immediate needs

Join expert Thomas Nield to discover how to serve data effectively to customers, analysts, data scientists, software engineers, and other stakeholders who need data to be available and usable.

Data has become a lucrative and strategic asset to organizations large and small. The data science, analytics, and machine learning booms have allowed many to claim that “data is the new oil.” However, any data scientist will tell you making data usable and available is a lot of work! For years, the importance of data engineering has been overlooked… until now.

Without data engineering, there is no data science, machine learning, analytics, or data-driven applications. Data has to be created, stored, ingested, transformed, and served to its end users. Across the data engineering lifecycle are also aspects of security, data administration, and even software engineering. Decisions have to be made on pairing the right solutions to the right problems, in a transient landscape full of commercial and open-source tools promising one silver bullet after another.

This series of live training will cover how to serve data effectively to customers, analysts, data scientists, software engineers, and other stakeholders who need data to be available and usable. Specific tools are not going to be covered, but rather categories of tools that can meet strategic decisions. Along the way we will cover best practices and how to scale effectively, while minimizing technical debt with data technologies.

What you’ll learn and how you can apply it

  • Provide an overview of data engineering and its functions
  • Define the data engineering lifecycle
  • Categorize and manage tools for different phases of a project

This live event is for you because...

  • You’re a data science or analyst professional seeking to make data usable and accessible
  • You work with a data science or software engineering team who need data serviced to them more effectively
  • You want to become a data engineering expert, creating data and moving it effectively across an organization

Prerequisites

  • Hands-on experience working with databases, data pipelines, or data tools.
  • Familiarity with Python, Java, SQL, NoSQL, or other data-related scripting.
  • While not required for usage in the class, it helps to have some hands-on experience with basic data platforms.

Course Preparation

Course Follow-Up

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

DAY 1:

What is Data? (50 minutes)

  • Discussion:
  • What is Data?
  • What is Data to the Customer?
  • Where Does Data Come From?
  • Presentation:
  • Operating Domain and Scope
  • Bias and Data Engineering
  • Discussion exercise: Assumptions and Biases
  • Q&A
  • BREAK: 10 min

What is Data Engineering? (60 minutes)

  • Presentation: A Quick History of Data Engineering
  • Discussion: Data Science versus Data Engineering
  • Presentation: The Noisy Tool Landscape
  • Discussion: Big Data versus Small Data
  • Presentation: The Data Engineering Lifecycle
  • Exercise/Lab: When All You Have is a Hammer…
  • Q&A and Wrap Up
  • BREAK: 10 min

Organizations and Data Engineering (30 mins)

  • Discussion: Who Are Customers of Data Engineering?
  • Presentation:
  • The downstream customers
  • The upstream suppliers
  • Presentation: Other stakeholders
  • Exercise/Lab: “Let’s productionize a model!”
  • Q&A

Data Creation (30 min)

  • Discussion: Where Does Data Come From?
  • Presentation: Data Sources
  • Exercise/Lab: Finding a data source
  • Q&A

DAY 2:

Data Storage (30 min)

  • Discussion: Where Have You Stored Data?
  • Presentation:
  • Storage Solutions
  • Separating Storage from Computing
  • Exercise/Lab: Choosing a storage solution
  • Q&A

Data Ingestion (20 min)

  • Discussion: Why Do You Move Data Between Systems?
  • Presentation:
  • Data Ingestion
  • Data Ingestion Interfaces
  • Exercise/Lab: Choosing An Ingestion Solution
  • Q&A and Wrap Up
  • BREAK: 10 min

Data Queries and Transformation (20 min)

  • Discussion: What Query Platforms Have You Used? What did you like/not like?
  • Presentation: Writing Queries
  • Exercise/Lab: A querying conundrum
  • Q&A (5 min)

Data Serving (30 min)

  • Discussion: What does it mean to “serve” data?
  • Presentation:
  • Serving data to customers
  • Analytics and Machine Learning
  • Reverse ETL
  • Exercise/Lab:”Self-Service” Dilemma?
  • Q&A
  • BREAK: 10 min

Security and Privacy (35 min)

  • Discussion: Why don’t we give access to data?
  • Presentation:
  • A Cybersecurity Story
  • Fundamental security practices
  • Ethics and Privacy
  • Exercise: Data Sharing with Colleagues
  • Q&A

Putting it All Together (25 min)

  • Discussion: What Are Your Takeaways?
  • Presentation:
  • The Data Engineering Lifecycle Recapped
  • Where Are Things Going?
  • Final Q&A and Course Wrap Up

Your Instructor

  • Thomas Nield

    Thomas Nield is the founder of Nield Consulting Group and an instructor at O’Reilly Media and the University of Southern California, teaching classes on data analysis, machine learning, mathematical optimization, AI system safety, and practical artificial intelligence. He’s authored multiple books including Getting Started with SQL and Essential Math for Data Science, both for O’Reilly. He’s also the founder and inventor of Yawman Flight, a company that develops universal handheld controls for flight simulation and unmanned aerial vehicles. Thomas enjoys making technical content relatable and relevant to those unfamiliar with or intimidated by it.

    Xlinksearch