Book description
The wish lists of many data-driven organizations seem reasonable enough. They’d like to capitalize on real-time data analysis, move beyond batch processing for time-critical insights, allow multiple users to share cluster resources, and provide predictable service levels. However, fundamental performance limitations of complex distributed systems such as Hadoop prevent much of this from happening.
In this report, Courtney Webster examines the root cause of these performance problems and explains why best practices for mitigating them—cluster tuning, provisioning, and even cluster isolation for mission critical jobs—don’t provide viable, scalable, or long-term solutions.
Organizations have been pushing Hadoop and other distributed systems to their performance breaking points as they seek to use clusters as shared resources across multiple business units and individual users. Once they hit this performance wall, companies will find it difficult to deliver on the big data promise at scale.
Read this report to find out what the implications are for your organization.
Publisher resources
Table of contents
Product information
- Title: The Hadoop Performance Myth
- Author(s):
- Release date: April 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491955437
You might also like
book
Cloudera Impala
Learn about Cloudera Impala--an open source project that's opening up the Apache Hadoop software stack to …
article
Reinventing the Organization for GenAI and LLMs
Previous technology breakthroughs did not upend organizational structure, but generative AI and LLMs will. We now …
book
Expert Hadoop® Administration
The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop …
book
The Barn Door is Open
The Barn Door Is Open: Frameworks and Tools for Success and Fulfillment in the Workplace is …