Chapter 6. Advanced State Management

In the past two chapters, we discussed stateful processing in Kafka Streams. As we learned how to perform aggregations, joins, and windowed operations, it became apparent that stateful processing is pretty easy to get started with.

However, as I alluded to previously, state stores come with additional operational complexity. As you scale your application, experience failures, and perform routine maintenance, you will learn that stateful processing requires a deeper understanding of the underlying mechanics to ensure your application continues to operate smoothly over time.

The goal of this chapter is to dig deeper into state stores so that you can achieve a higher level of reliability when building stateful stream processing applications. A large portion of this chapter is dedicated to the topic of rebalancing, which occurs when work needs to be redistributed across your consumer group. Rebalancing can be especially impactful for stateful applications, so we’ll develop our understanding so that you are equipped to deal with this in your own applications.

Some of the questions we will answer include:

  • How are persistent state stores represented on disk?

  • How do stateful applications achieve fault tolerance?

  • How can we configure built-in state stores?

  • What kinds of events are the most impactful for stateful applications?

  • What measures can be taken to minimize recovery time of stateful tasks?

  • How do you ensure that state stores don’t grow ...

Get Mastering Kafka Streams and ksqlDB now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.