Chapter 3. Components in a Kafka Connect Data Pipeline

A Kafka Connect pipeline involves one or more plug-ins and a Kafka Connect runtime that is responsible for executing them. Kafka Connect streams data between a Kafka cluster and one or more external systems. It is usual for a Kafka Connect pipeline to interact with a single Kafka cluster. For a single Kafka cluster, there is no limit to the number of Kafka Connect pipelines that it can be part of.

In this chapter, we take a closer look at the runtime and each of the Kafka Connect connector plug-ins: connectors, converters, transformations, and predicates. For each component, we explain its role in pipelines and how to use it. People often use the term “Connect” to refer to one component or the whole pipeline, so we introduce the correct terms for each component so you can differentiate them. By the end of this chapter, you will know how to build, configure and run a basic Kafka Connect pipeline using the official Kafka distribution.

Kafka Connect Runtime

At its core, Kafka Connect is a runtime that runs and manages data pipelines. You can easily run Kafka Connect on a laptop using the scripts, JAR files, and configuration files provided in the Kafka distribution. For example, Kafka 3.5.0 includes the following script in the bin directory for Unix-like operating systems:

connect-distributed.sh

The equivalent script for Windows operating systems is under bin/windows in the Kafka distribution:

connect-distributed.bat

The libs ...

Get Kafka Connect now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.