Book description
More organizations than ever understand the importance of data lake architectures for deriving value from their data. Building a robust, scalable, and performant data lake remains a complex proposition, however, with a buffet of tools and options that need to work together to provide a seamless end-to-end pipeline from data to insights.
This book provides a concise yet comprehensive overview on the setup, management, and governance of a cloud data lake. Author Rukmani Gopalan, a product management leader and data enthusiast, guides data architects and engineers through the major aspects of working with a cloud data lake, from design considerations and best practices to data format optimizations, performance optimization, cost management, and governance.
- Learn the benefits of a cloud-based big data strategy for your organization
- Get guidance and best practices for designing performant and scalable data lakes
- Examine architecture and design choices, and data governance principles and strategies
- Build a data strategy that scales as your organizational and business needs increase
- Implement a scalable data lake in the cloud
- Use cloud-based advanced analytics to gain more value from your data
Publisher resources
Table of contents
- Preface
- 1. Big DataâBeyond the Buzz
- 2. Big Data Architectures on the Cloud
- 3. Design Considerations for Your Data Lake
- 4. Scalable Data Lakes
- 5. Optimizing Cloud Data Lake Architectures for Performance
- 6. Deep Dive on Data Formats
- 7. Decision Framework for Your Architecture
-
8. Six Lessons for a Data Informed Future
- Lesson 1: Focus on the How and When, Not the If and Why, When It Comes to Cloud Data Lakes
- Lesson 2: With Great Power Comes Great ResponsibilityâData Is No Exception
- Lesson 3: Customers Lead Technology, Not the Other Way Around
- Lesson 4: Change Is Inevitable, so Be Prepared
- Lesson 5: Build Empathy and Prioritize Ruthlessly
- Lesson 6: Big Impact Does Not Happen Overnight
- Summary
- A. Cloud Data Lake Decision Framework Template
- Index
- About the Author
Product information
- Title: The Cloud Data Lake
- Author(s):
- Release date: December 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098116583
You might also like
book
Designing Cloud Data Platforms
Centralized data warehouses, the long-time defacto standard for housing data for analytics, are rapidly giving way …
book
Data Engineering with Google Cloud Platform
Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the …
book
Data Engineering with AWS
The missing expert-led manual for the AWS ecosystem — go from foundations to building data engineering …
book
The Enterprise Big Data Lake
The data lake is a daring new approach for harnessing the power of big data technology …