Chapter 2. Stream Processing Platforms

In Chapter 1, we introduced a simple use case of getting real-time data to consumers. We also introduced connectors and how they can convert data at rest into data in motion (or event streams) and then publish them into topics in streaming platforms.

The event streams can now be read, but they most likely aren’t yet in a format consumers can use. Events tend to need cleansing and preparation before they undergo analytical processing. Events also need to be enriched with context for them to be useful enough to derive insights. Analytical processing heavily relies on the accuracy and reliability of the data. By addressing issues such as missing values, inconsistencies, duplicates, and outliers, data quality is improved, leading to more reliable and accurate analytical results.

In Figure 2-1, event data preparation can also significantly impact the performance of analytical queries. By optimizing the data layout, indexing, and partitioning, the efficiency of data retrieval and processing can be improved. This includes techniques such as data denormalization, columnar storage, and indexing strategies tailored for the analytical workload. Well-prepared data can reduce the processing time and enable faster insights. We will cover denormalization, columnar storage, and indexing strategies in Chapter 4 when we discuss how to serve analytical data to consumers.

Figure 2-1. Cleanse, prepare, and enrich event data prior to reaching the destination ...

Get Streaming Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.