Book description
One of the best ways of getting to grips with the world's most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals.
In Detail
Storm is the most popular framework for real-time stream processing. Storm provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission critical applications. It is both an integration technology as well as a data flow and control mechanism, making it the core of many big data platforms. Storm is essential if you want to deploy, operate, and develop data processing flows capable of processing billions of transactions.
"Storm: Distributed Real-time Computation Blueprints" covers a broad range of distributed computing topics, including not only design and integration patterns, but also domains and applications to which the technology is immediately useful and commonly applied. This book introduces you to Storm using real-world examples, beginning with simple Storm topologies. The examples increase in complexity, introducing advanced Storm concepts as well as more sophisticated approaches to deployment and operational concerns.
"Storm: Distributed Real-time Computation Blueprints" covers a broad range of distributed computing topics, including not only design and integration patterns, but also domains and applications to which the technology is immediately useful and commonly applied. This book introduces you to Storm using real-world examples, beginning with simple Storm topologies. The examples increase in complexity, introducing advanced Storm concepts as well as more sophisticated approaches to deployment and operational concerns.
This book covers the domains of real-time log processing, sensor data analysis, collective and artificial intelligence, financial market analysis, Natural Language Processing (NLP), graph analysis, polyglot persistence and online advertising. While exploring distributed computing applications in each of those domains, the book covers advanced Storm topics such as Trident and Distributed State, as well as integration patterns for Druid and Titan. Simultaneously, the book also describes the deployment of Storm to YARN and the Amazon infrastructure, as well as other key operational concerns such as centralized logging.
By the end of the book, you will have gained an understanding of the fundamentals of Storm and Trident and be able to identify and apply those fundamentals to any suitable problem.
What You Will Learn
- Learn the fundamentals of Storm
- Install and configure storm in pseudo-distributed and fully-distributed mode
- Familiarize yourself with the fundamentals of Trident and distributed state
- Design patterns for data flows in a distributed system
- Create integration patterns for persistence mechanisms such as Titan
- Deploy and run Storm clusters by leveraging YARN
- Achieve continuous availability and fault tolerance through distributed storage
- Recognize centralized logging mechanisms and processing
- Implement polyglot persistence and distributed transactions
- Calculate the effectiveness of a campaign using click-through analysis
Table of contents
-
Storm Blueprints: Patterns for Distributed Real-time Computation
- Table of Contents
- Storm Blueprints: Patterns for Distributed Real-time Computation
- Credits
- About the Authors
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Distributed Word Count
-
2. Configuring Storm Clusters
- Introducing the anatomy of a Storm cluster
- Introducing the Storm technology stack
-
Installing Storm on Linux
- Installing the base operating system
- Installing Java
- ZooKeeper installation
- Storm installation
- Running the Storm daemons
- Configuring Storm
- Mandatory settings
- Optional settings
- The Storm executable
- Setting up the Storm executable on a workstation
- The daemon commands
- The management commands
- Local debug/development commands
- Submitting topologies to a Storm cluster
- Automating the cluster configuration
- A rapid introduction to Puppet
- Summary
- 3. Trident Topologies and Sensor Data
- 4. Real-time Trend Analysis
-
5. Real-time Graph Analysis
- Use case
- Architecture
- A brief introduction to graph databases
- Software installation
- Setting up Titan to use the Cassandra storage backend
- Graph data model
- Connecting to the Twitter stream
- Twitter graph topology
- Implementing GraphState
- Implementing GraphFactory
- Implementing GraphTupleProcessor
- Putting it all together – the TwitterGraphTopology class
- Querying the graph with Gremlin
- Summary
- 6. Artificial Intelligence
- 7. Integrating Druid for Financial Analytics
- 8. Natural Language Processing
- 9. Deploying Storm on Hadoop for Advertising Analysis
- 10. Storm in the Cloud
- Index
Product information
- Title: Storm Blueprints: Patterns for Distributed Real-time Computation
- Author(s):
- Release date: March 2014
- Publisher(s): Packt Publishing
- ISBN: 9781782168294
You might also like
book
Storm Applied
Storm Applied is a practical guide to using Apache Storm for the real-world tasks associated with …
article
Run Llama-2 Models
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …
article
Use Github Copilot for Prompt Engineering
Using GitHub Copilot can feel like magic. The tool automatically fills out entire blocks of code--but …
article
Reinventing the Organization for GenAI and LLMs
Previous technology breakthroughs did not upend organizational structure, but generative AI and LLMs will. We now …