Book description
"Big Data Simplified blends technology with strategy and delves into applications of big data in specialized areas, such as recommendation engines, data science and Internet of Things (IoT) and enables a practitioner to make the right technology choice. The steps to strategize a big data implementation are also discussed in detail. This book presents a holistic approach to the topic, covering a wide landscape of big data technologies like Hadoop 2.0 and package implementations, such as Cloudera. In-depth discussion of associated technologies, such as MapReduce, Hive, Pig, Oozie, ApacheZookeeper, Flume, Kafka, Spark, Python and NoSQL databases like Cassandra, MongoDB, GraphDB, etc., is also included.Table of contents
- Cover
- About Pearson
- Tittle
- Copyright
- Dedication
- Brief Contents
- Contents (1/2)
- Contents (2/2)
- Preface
- Acknowledgements
- About the Authors
- Model Syllabus for Big Data
- Lesson Plan
- Chapter 1 A Closer Look at Data
-
Chapter 2 Introducing Big Data
- 2.1 Introduction
- 2.2 The Transition to Big Data
- 2.3 The Definition of Big Data
- 2.4 The V’s
- 2.5 Sources of Big Data
- 2.6 Common Applications of Big Data
- 2.7 An Introduction to Big Data Technologies
- 2.8 An Overview of Popular Vendors
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
-
Chapter 3 Introducing Hadoop
- 3.1 Introduction
- 3.2 An Overview of Hadoop
- 3.3 Configuring a Hadoop Cluster (1/2)
- 3.3 Configuring a Hadoop Cluster (2/2)
- 3.4 Storing Data with HDFS
- 3.5 HDFS Technical Commands
- 3.6 Hadoop Distributions
- 3.7 Hadoop in the Cloud
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
-
Chapter 4 Introducing MapReduce
- 4.1 Introduction
- 4.2 Processing Data with MapReduce
- 4.3 Parallelism in Map and Reduce Phases
- 4.4 Optimize the Map Phase Using a Combiner
- 4.5 What is YARN?
- 4.6 Example Use Case on MapReduce: Development and Execution Step-by-step (1/2)
- 4.6 Example Use Case on MapReduce: Development and Execution Step-by-step (2/2)
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
-
Chapter 5 Introducing NoSQL
- 5.1 Introduction
- 5.2 NoSQL Databases in the Light of CAP Theorem
- 5.3 NoSQL Product Categories
-
5.4 NoSQL Database: Cassandra
- 5.4.1 Characteristics of Cassandra
- 5.4.2 Cassandra Architecture
- 5.4.3 Components of Cassandra
- 5.4.4 Cassandra Write Operations at a Node Level
- 5.4.5 Cassandra Node Level Read Operation
- 5.4.6 KEYSPACE in Cassandra
- 5.4.7 Starting Cassandra Server and Cqlsh Query Editor
- 5.4.8 DataStax Distribution Package
- 5.5 NoSQL Databases in the Cloud
- 5.6 NoSQL – Do’s and Don’ts
- 5.7 Business Intelligence and NoSQL
- 5.8 Big Data and NoSQL
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
-
Chapter 6 Introducing Spark and Kafka
- 6.1 Introducing Spark
-
6.2 Working with Kafka
- 6.2.1 What is Apache Kafka
- 6.2.2 Kafka Architecture
- 6.2.3 Need of Apache Kafka in Big Data
- 6.2.4 Kafka Use Cases
- 6.2.5 Why is Kafka so Fast?
- 6.2.6 Kafka Needs ZooKeeper
- 6.2.7 Different Components in Kafka
- 6.2.8 Difference between Apache Kafka and Apache Flume
- 6.2.9 Kafka Demonstration—How Messages are Passing from Publisher to Consumer through a Topic
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
- Chapter 7 Other BigData Tools and Technologies
-
Chapter 8 Working with Big Data in R
- 8.1 Prerequisites
- 8.2 Exploratory Data Analysis
- 8.3 R Libraries for Dealing with Large Data Sets
- 8.4 Integrating Hadoop with R
- 8.5 Simple R Program with Hadoop
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
- Chapter 9 Working with Big Data in Python
- Chapter 10 Big Data Applied
-
Chapter 11 Big Data Strategy
- 11.1 Introduction
- 11.2 Two Typical Big Data Use Cases
- 11.3 Data Warehouses vs. Data Lakes—What is Your Strategy?
- 11.4 Key Questions to Ask
- 11.5 Getting Ready for a Big Data Program
- 11.6 Making Technology Choices
- 11.7 Making Tooling Choices
- Summary
- Short-answer Type Questions (5 Marks Questions)
- Long-answer Type Questions (10 Marks Questions)
-
Chapter 12 Case Study: Retail Near Real-time Analytics
- 12.1 Introduction to Retail Domain
- 12.2 Near Real-time Analytics: Problem Statement
- 12.3 NRT Analytics: Solution Approach
- 12.4 NRT Analytics: Details of Solution Implemented (1/3)
- 12.4 NRT Analytics: Details of Solution Implemented (2/3)
- 12.4 NRT Analytics: Details of Solution Implemented (3/3)
- Summary
- Multiple-choice Questions (1 Mark Questions)
- Short-answer Type Questions (5 Marks Questions)
- Appendix (1/2)
- Appendix (2/2)
- Index
Product information
- Title: Big Data Simplified
- Author(s):
- Release date: June 2019
- Publisher(s): Pearson Education India
- ISBN: None
You might also like
book
Big Data
Big Data teaches you to build big data systems using an architecture that takes advantage of …
book
Modern Big Data Processing with Hadoop
A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop About This …
video
Big Data for Architects
Do you want a guide that will help you to pick the right Big Data technology …
video
Building Better Distributed Data Pipelines
Patrick McFadin explains the basics of how to build more efficient data pipelines, using Apache Kafka …