Book description
Big data and advanced analytics have increasingly moved to the cloud as organizations pursue actionable insights and data-driven products using the growing amounts of information they collect. But few companies have truly operationalized data so it’s usable for the entire organization. With this pragmatic ebook, engineers, architects, and data managers will learn how to build and extract value from a data lake in the cloud and leverage the compute power and scalability of a cloud-native data platform to put your company’s vast data trove into action.
Holden Ackerman and Jon King of Qubole take you through the basics of building a data lake operation, from people to technology, employing multiple technologies and frameworks in a cloud-native data platform. You'll dive into the tools and processes you need for the entire lifecycle of a data lake, from data preparation, storage, and management to distributed computing and analytics. You’ll also explore the unique role that each member of your data team needs to play as you migrate to your cloud-native data platform.
- Leverage your data effectively through a single source of truth
- Understand the importance of building a self-service culture for your data lake
- Define the structure you need to build a data lake in the cloud
- Implement financial governance and data security policies for your data lake through a cloud-native data platform
- Identify the tools you need to manage your data infrastructure
- Delineate the scope, usage rights, and best tools for each team working with a data lake—analysts, data scientists, data engineers, and security professionals, among others
Table of contents
- Acknowledgments
- Foreword
-
Introduction
- Overview: Big Data’s Big Journey to the Cloud
- My Journey to a Data Lake
- A Quick History Lesson on Big Data
- The Second Phase of Big Data Development
- Weather Update: Clouds Ahead
- Bringing Big Data and Cloud Together
- Commercial Cloud Distributions: The Formative Years
- Big Data and AI Move Decisively to the Cloud, but Operationalizing Initiatives Lag
- We Believe in the Cloud for Big Data and AI
- 1. The Data Lake: A Central Repository
- 2. The Importance of Building a Self-Service Culture
- 3. Getting Started Building Your Data Lake
- 4. Setting the Foundation for Your Data Lake
- 5. Governing Your Data Lake
- 6. Tools for Making the Data Lake Platform
- 7. Securing Your Data Lake
- 8. Considerations for the Data Engineer
- 9. Considerations for the Data Scientist
- 10. Considerations for the Data Analyst
- 11. Case Study: Ibotta Builds a Cost-Efficient, Self-Service Data Lake
- 12. Conclusion
Product information
- Title: Operationalizing the Data Lake
- Author(s):
- Release date: July 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492049500
You might also like
book
Data Lakes
The concept of a data lake is less than 10 years old, but they are already …
book
Data Lake for Enterprises
A practical guide to implementing your enterprise data lake using Lambda Architecture as the base Key …
book
Data Lake Development with Big Data
Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of …
book
Data Lake Maturity Model
Data is changing everything. Many industries today are being fundamentally transformed through the accumulation and analysis …