Chapter 69. The End of ETL as We Know It

Paul Singman

If you’re as sick of this three-letter term as I am, you’ll be happy to know there is another way.

If you work in data in 2021, the acronym ETL is everywhere.

Ask certain people what they do, and their whole response will be “ETL.” On LinkedIn, thousands of people have the title “ETL developer.” It can be a noun, a verb, an adjective, and even a preposition. (Yes, a mouse can ETL a house.)

Standing for extract, transform, load, ETL refers to the general process of taking batches of data out of one database or application and loading them into another. Data teams are the masters of ETL, as they often have to stick their grubby fingers into the tools and databases of other teams—the software engineers, marketers, and operations folk—to prep a company’s data for deeper, custom analyses.

The good news is that with a bit of foresight, data teams can remove most of the ETL onus from their plate entirely. How is this possible?

Replacing ETL with Intentional Data Transfer

The path forward is with intentional transfer of data (ITD). The need for ETL arises because no one builds their user database or content management system (CMS) with downstream analytics in mind. Instead of making the data team select * from purchases_table where event_date > now() — 1hr every hour, you can add logic in the application code that first processes ...

Get 97 Things Every Data Engineer Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.