Chapter 5. Compute Engines for Lakehouse Architectures
If storage is the heart of the lakehouse, then the compute engine is the brain that performs all of its computational activities. You need a performant compute engine to ingest, process, and consume data from a data platform. A compute engine enables platform users like data engineers, data analysts, data scientists, business users, and others to access and use data per their needs.
In this chapter, we will first discuss the various data computation benefits you get when implementing lakehouse architecture. We will explore how lakehouses enable unified batch and real-time processing, enhance the performance of BI workloads, and offer the freedom to choose any processing engine.
We will also discuss the various compute engines that are available via open source tools, cloud platforms, or other third-party platforms. Every CSP offers data and analytics services that provide compute resources. We will discuss some of the most adopted services that data engineers and analysts use in modern data platforms, and their varying levels of support for open table formats. The examples we discuss will help you understand the integration challenges between cloud services and open table formats.
Finally, we will examine the key considerations when choosing a compute engine, and when designing and implementing your platform’s data ingestion, processing, and consumption processes.
Data Computation Benefits of Lakehouse Architecture
As we ...
Get Practical Lakehouse Architecture now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.