Chapter 7. Dremio’s SQL Query Engine

Dremio’s SQL Query Engine, which is part of the Dremio Lakehouse Platform, is widely used to support various analytical workloads such as ad hoc SQL or low-latency business intelligence (BI) queries directly on the data stored in a data lake. Dremio allows you to query data across multiple data sources, thereby enabling federation of queries and providing a unified view of the data without the need to move or copy it. All of this is done with the support of a vectorized query engine that allows Dremio to achieve fast query results even on extremely large datasets. This, when combined with the capabilities of the Apache Iceberg table format, provides a potent combination to manage and query datasets with improved performance and ease of the UI.

This chapter will provide an overview of how to get hands-on with Dremio and Iceberg.

Configuration

Dremio’s Lakehouse Platform has both software- and cloud-based options. In this chapter, the examples will use Dremio Cloud. As discussed in Chapter 6, the first step to get started with Iceberg tables is to define the catalog configuration. To configure an Iceberg catalog in the Dremio Lakehouse Platform, all you need to do is add a new source by going to the Sources section of the Dremio interface and selecting Add Data Source, as shown in Figure 7-1.

Figure 7-1. Sources available to connect to in Dremio ...

Get Apache Iceberg: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.