Book description
Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way.
Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg.
With this book, you'll learn:
- The architecture of Apache Iceberg tables
- What happens under the hood when you perform operations on Iceberg tables
- How to further optimize Iceberg tables for maximum performance
- How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio
Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.
Publisher resources
Table of contents
- Foreword by Gerrit Kazmaier
- Foreword by Raghu Ramakrishnan
- Foreword by Rick Sears
- Preface
- I. Fundamentals of Apache Iceberg
- 1. Introduction to Apache Iceberg
- 2. The Architecture of Apache Iceberg
- 3. Lifecycle of Write and Read Queries
- 4. Optimizing the Performance of Iceberg Tables
- 5. Iceberg Catalogs
- II. Hands-on with Apache Iceberg
- 6. Apache Spark
- 7. Dremio’s SQL Query Engine
- 8. AWS Glue
- 9. Apache Flink
- III. Apache Iceberg in Practice
-
10. Apache Iceberg in Production
-
Apache Iceberg Metadata Tables
- The history Metadata Table
- The metadata_log_entries Metadata Table
- The snapshots Metadata Table
- The files Metadata Table
- The manifests Metadata Table
- The partitions Metadata Table
- The all_data_files Metadata Table
- The all_manifests Metadata Table
- The refs Metadata Table
- The entries Metadata Table
- Using the Metadata Tables in Conjunction
- Isolation of Changes with Branches
- Multitable Transactions
- Rolling Back Changes
- Conclusion
-
Apache Iceberg Metadata Tables
- 11. Streaming with Apache Iceberg
- 12. Governance and Security
- 13. Migrating to Apache Iceberg
- 14. Real-World Use Cases of Apache Iceberg
- Index
- About the Authors
Product information
- Title: Apache Iceberg: The Definitive Guide
- Author(s):
- Release date: May 2024
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098148621
You might also like
book
Snowflake: The Definitive Guide
Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to …
book
Terraform: Up and Running, 3rd Edition
Terraform has become a key player in the DevOps world for defining, launching, and managing infrastructure …
book
Building Microservices, 2nd Edition
As organizations shift from monolithic applications to smaller, self-contained microservices, distributed systems have become more fine-grained. …
book
Kafka: The Definitive Guide, 2nd Edition
Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing …