Book description
A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence
About This Book- Learn to build and run a big data application with sample code
- Explore examples to implement activities that a big data architect performs
- Use Machine Learning and AI for structured and unstructured data
Big Data Architect's Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is your one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect.
What You Will Learn- Learn Hadoop Ecosystem and Apache projects
- Understand, compare NoSQL database and essential software architecture
- Cloud infrastructure design considerations for big data
- Explore application scenario of big data tools for daily activities
- Learn to analyze and visualize results to uncover valuable insights
- Build and run a big data application with sample code from end to end
- Apply Machine Learning and AI to perform big data intelligence
- Practice the daily activities performed by big data architects
The big data architects are the “masters” of data, and hold high value in today's market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.
Big Data Architect's Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.
By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.
Style and approachComprehensive guide with a perfect blend of theory, examples and implementation of real-world use-cases
Table of contents
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Preface
-
Why Big Data?
- What is big data?
- Characteristics of big data
- Volume
- Velocity
- Variety
- Veracity
- Variability
- Value
- Solution-based approach for data
- Data – the most valuable asset
- Traditional approaches to data storage
- Clustered computing
- High availability
- Resource pooling
- Easy scalability
- Big data – how does it make a difference?
- Big data solutions – cloud versus on-premises infrastructure
- Cost
- Security
- Current capabilities
- Scalability
- Big data glossary
- Big data
- Batch processing
- Cluster computing
- Data warehouse
- Data lake
- Data mining
- ETL
- Hadoop
- In-memory computing
- Machine learning
- MapReduce
- NoSQL
- Stream processing
- Summary
- Big Data Environment Setup
-
Hadoop Ecosystem
- Apache Hadoop
- Hadoop Distributed File System
- HDFS hands-on
- Creating a directory in HDFS
- Copying files from a local file system to HDFS
- Copying files from HDFS to a local file system
- Deleting files and folders in HDFS
- Hadoop MapReduce
- Job Tracker and Task Tracker
- The execution flow of MapReduce 
- Mapper
- Shuffle and Sort
- Reducer
- Example program
- Preparing the data file for analysis
- Program code
- Driver program
- Mapper program
- Reducer program
- Observations and results
- YARN
- Resource Manager
- Node Manager
- Container
- Application Master
- Apache Projects related to big data
- Apache Zookeeper
- Apache Kafka
- Apache Flume
- Apache Cassandra
- Apache HBase
- Apache Spark
- Summary
-
NoSQL Database
- What is NoSQL?
- Benefits of NoSQL databases
- NoSQL versus RDBMS
- The CAP theorem
- The ACID properties
- Data models in NoSQL
- Key-value data stores
- Document store
- Column stores
- Graph stores
- Apache Cassandra
- Installation
- Starting Cassandra
- The Cassandra Query Language – CQL
- The help command
- Basic commands
- Data manipulation
- Creating, altering, and deleting a keyspace
- Creating, altering, and deleting tables
- Inserting, updating, and deleting data
- The MongoDB database
- Installing MongoDB
- Starting MongoDB
- Working on MongoDB
- The help command
- Basic commands
- Data manipulation
- Creating and deleting databases
- Creating and deleting collections
- The c<span class="_Tgc _y9e">reate, retrieve, update, delete operations
- Neo4j database
- Installing Neo4j
- Starting Neo4j
- The cypher query language
- Help
- Basic operations in Cypher
- Creating nodes, relationships, and properties
- Updating nodes, relationships, and properties
- Deleting nodes, relationships, and properties
- Reading nodes, relationships, and properties
- Summary
- Off-the-Shelf Commercial Tools
-
Containerization
- Virtualization
- Hypervisors
- Hardware-based hypervisors
- Software-based hypervisors
- What is containerization?
- Benefits of containers
- Docker
- Docker workflow
- Installation
- Basic commands
- Docker images
- Building a Docker image
- Running and verifying Docker images
- Importing and exporting Docker images
- Docker Swarm
- Setting up Docker Swarm
- Creating service containers
- Replicating containers
- Removing container services
- Kubernetes
- Key components
- Pods
- ReplicaSets
- Deployments
- PetSets
- Installation
- Deployment
- Kubernetes Dashboard
- Summary
- Network Infrastructure
- Cloud Infrastructure
- Security and Monitoring
-
Frontend Architecture
- React JS
- Key concepts 
- Node.js
- JSX
- Unidirectional dataflow
- Getting started with ReactJS
- Single page application
- React application project
- React app directory structure
- Components
- Properties
- Event handling
- State
- Redux
- Architecture of Redux
- Key concepts
- Single store
- Action
- Reducers
- Guestbook application
- Installation
- Create a store
- Setting up Reducer
- Setting up Dispatcher
- Connect function
- Setting up Subscribers
- Final output
- Summary
-
Backend Architecture
- API
- RESTful API
- HTTP request methods
- GET
- POST
- PUT
- DELETE
- Authentication
- Basic authentication
- JSON Web Token
- Header
- Payload
- Signature
- Practical
- RESTful web service
- Java client
- Redis
- Installation
- Redis server
- Redis client
- Working with Redis
- Redis data types and structures
- String
- HashMap
- List
- Set
- Redis Publish/Subscribe
- Common key operations
- Summary
-
Machine Learning
- Machine learning
- Types of algorithms
- Parametric algorithms
- Non-parametric algorithms
- Supervised learning
- The classification model
- Binary classification 
- Multi-class classification
- The regression model
- Linear regression
- Polynomial regression
- Unsupervised learning
- Clustering, k-means
- Neural networks
- Feedforward neural network
- Recurrent neural network
- Symmetrically connected neural network
- Deep neural networks
- Decision tree classifiers
- Summary
-
Artificial Intelligence
- Artificial intelligence
- Convolutional neural networks
- Deep learning using TensorFlow
- TensorFlow
- Installation
- TensorFlow program
- Uninstalling TensorFlow
- TensorBoard
- Program
- Launching TensorBoard
- TensorBoard graph
- Object detection using YOLO
- Installation
- Compiling YOLO library
- Trained weights
- Detecting objects in an image
- Summary
-
Elasticsearch
- Installing Elasticsearch
- Starting the Elasticsearch server
- Auto starting the Elasticsearch service
- Stopping the Elasticsearch server
- Uninstalling Elasticsearch
- Kibana
- Installation
- Starting Kibana
- Uninstalling Kibana
- Security
- Securing Elasticsearch
- Securing Kibana
- Understanding queries – CRUD commands
- Creating
- Reading
- Updating
- Deleting
- Summary
- Structured Data
- Unstructured Data
- Data Visualization
-
Financial Trading System
- What is algorithmic trading?
- Benefits of algorithmic trading
- Big data in the financial market
- Algorithmic trading strategies
- Building an Expert Advisor
- MetaTrader
- Downloading and setting up MetaTrader
- MetaQuotes language
- Trading bot objective
- Practical
- Trading pattern – moving average
- Decision time: buy or sell
- Complete program
- Backtesting in MetaTrader 4
- Summary
-
Retail Recommendation System
- Types of recommendation system
- Collaborative filtering
- Content-based filtering
- Demographic-based system
- Utility-based system
- Knowledge-based system
- Hybrid model
- Commercial tools
- Barilliance
- Softcube
- Strands
- Monetate
- Nosto
- Book recommendation system
- Dataset
- Directory structure
- Code
- Reading the dataset
- Verifying the dataset
- Data analysis
- Age group
- Commutative rating
- Algorithms
- Top-rated books
- Popular books
- Demographic-based recommendation
- Useful resources
- Summary
- Other Books You May Enjoy
Product information
- Title: Big Data Architect's Handbook
- Author(s):
- Release date: June 2018
- Publisher(s): Packt Publishing
- ISBN: 9781788835824
You might also like
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …
book
Data Management at Scale, 2nd Edition
As data management continues to evolve rapidly, managing all of your data in a central place, …
book
Fundamentals of Data Engineering
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and …
audiobook
Fundamentals of Data Engineering
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and …