Chapter 9. Streaming Joins

When I first began learning about joins, it was an intimidating topic; LEFT, OUTER, SEMI, INNER, CROSS: the language of joins is expressive and expansive. Add on top of that the dimension of time that streaming brings to the table, and you’re left with what appears to be a challengingly complex topic. The good news is that joins really aren’t the frightening beast with nasty, pointy teeth that they might initially appear to be. As is the case with so many other complex topics, after you understand the central ideas and themes of joins, the broader landscape that’s built on top of these basics suddenly becomes so much more accessible. So please join me now as we explore the fascinating topic of...well, joins.

All Your Joins Are Belong to Streaming

What does it mean to join two datasets? We understand intuitively that joins are just a specific type of grouping operation: by joining together data that share some property (i.e., key), we collect together some number of previously unrelated individual data elements into a group of related elements. And as we learned in Chapter 6, grouping operations always consume a stream and yield a table. Knowing these two things, it’s only a small leap to then arrive at the conclusion that forms the basis for this entire chapter: at their hearts, all joins are streaming joins.

What’s great about this fact is that it actually makes the topic of streaming joins that much more tractable. All of the tools we’ve learned for ...

Get Streaming Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.