Chapter 10. Building a Lakehouse on Delta Lake

Chapter 1 introduced the concept of a data lakehouse, which combines the best elements of a traditional data warehouse and a data lake. Throughout this book you have learned about the five key capabilities that help enable the lakehouse architecture: the storage layer, data management, SQL analytics, data science and machine learning, and the medallion architecture.

Before diving into building a lakehouse on Delta Lake, let’s quickly review the industry’s data management and analytics evolution:

Data warehouse

From the 1970s through the early 2000s, data warehouses were designed to collect and consolidate data into a business context, providing support for business intelligence and analytics. As data volumes grew, velocity, variety, and veracity also increased. Data warehouses had challenges with addressing these requirements in a flexible, unified, and cost-effective manner.

Data lake

In the early 2000s, increased volumes of data drove the development of data lakes (initially on premises with Hadoop and later with the cloud), a cost-effective central repository to store any format of data at any scale. But again, even with added benefits there were additional challenges. Data lakes had no transactional support, were not designed for business intelligence, offered limited data governance support, and still required other technologies (e.g., data warehouses) to fully support the data ecosystem. This led to overly complex environments ...

Get Delta Lake: Up and Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.