Chapter 5. Streaming Analytics

Analytics is an end goal of many streaming integration use cases. You can perform in a cloud data warehouse or by using machine learning on a large-scale data store. You can do it using an on-premises Hadoop infrastructure, or in a cloud storage environment. You could utilize a document store or another store like Azure Cosmos DB or Google Cloud Spanner. It could even be done writing into a database.

The most important point is that people want their data to be always up to date. So, when you’re analyzing data, you should always possess the most recent data. And a primary driver of streaming analytics is that people want to do analytics on much more current data than was previously possible.

With ETL systems, people were satisfied with data that was a few hours or even a day old because they were running end-of-day reports, and that was the data that they wanted to see. With streaming systems, they want insight into the most current data. This is true whether the data is being analyzed in memory or is landing somewhere else.

However, getting real-time insights from data is typically not possible if the data needs to land somewhere (Figure 5-1). It’s not possible to get within a few seconds – much less a subsecond from changes happening in the source system to being delivered into a target system that way. And it’s still going to be necessary to trigger the analytics in that target platform somehow. Perhaps you’re pulling it, or maybe you’re running ...

Get Streaming Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.