Chapter 9. Monitoring Kafka Connect

As an administrator or SRE of a Kafka Connect pipeline, it is your job to make sure that it is running correctly. To do this, you need to set up monitoring so you can easily check the state of the system and quickly diagnose problems. Along with identifying and fixing existing problems, monitoring can allow you to spot potential future problems and make changes before they have an impact.

If you have other users or systems that are relying on your pipeline, you should have an understanding with them about the guarantees you can offer in terms of uptime, availability, latency etc. These guarantees are referred to as service-level objectives (SLOs) or service-level agreements (SLAs). Having a good monitoring setup lets you not only spot and resolve problems more quickly, but also makes it easier to provide an accurate SLO or SLA.

In this chapter, we look at the different mechanisms you can use to monitor Kafka Connect and give some guidance on how best to use them. There are three ways for you to get insights into Kafka Connect clusters:

  • Analyzing metrics

  • Using the REST API

  • Processing log messages

Each of these resources provides a slightly different view of the system, and together they allow you to fully monitor Kafka Connect.

The most reliable way to quickly identify problems in your system is by tracking metrics. Even a small Kafka Connect pipeline produces thousands of metrics, so you should use monitoring tools to do this. These tools ...

Get Kafka Connect now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.