Chapter 5. ETL Subsystems

As surprising as it may sound, until a few years ago there was no book available that was solely dedicated to the challenges involved with ETL. Sure, ETL was covered as part of delivering a BI solution, but many people needed more in-depth guidance to help them successfully implement an ETL solution, independent of the tools used. The book The Data Warehouse ETL Toolkit by Ralph Kimball and Joe Caserta (Wiley Publishing, 2004) filled that gap. A bit later, the ideas of that book found their way into an article, "The 38 Subsystems of ETL," which added more structure to the various tasks that are part of an ETL project.

Note

The original article can still be found online at http://intelligent-enterprise.informationweek.com/showArticle.jhtml?articleID=54200319. The most recent version can be found in The Kimball Group Reader, article 11.2, "The 34 Subsystems of ETL," pp. 430–434 (Wiley 2010). The names of the subsystems in this book are taken from the latter reference since the names have been altered slightly compared to earlier publications.

In 2008, Wiley published the second edition of one of the best-selling BI books ever: The Data Warehouse Lifecycle Toolkit, also by Ralph Kimball and his colleagues in the Kimball Group. In that book, the subsystems were restructured a second time, resulting in a slightly condensed list consisting of 34 ETL subsystems. We were fortunate to get Ralph's permission to use this list as the foundation for Part II of this book, ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.