Chapter 9. Setting Up Flink for Streaming Applications
Todayâs data infrastructures are diverse. Distributed data processing frameworks like Apache Flink need to be set up to interact with several components such as resource managers, filesystems, and services for distributed coordination.
In this chapter, we discuss the different ways to deploy Flink clusters and how to configure them securely and make them highly available. We explain Flink setups for different Hadoop versions and filesystems and discuss the most important configuration parameters of Flinkâs master and worker processes. After reading this chapter, you will know how to set up and configure a Flink cluster.
Deployment Modes
Flink can be deployed in different environments, such as a local machine, a bare-metal cluster, a Hadoop YARN cluster, or a Kubernetes cluster. In âComponents of a Flink Setupâ, we introduced the different components of a Flink setup: the JobManager, TaskManager, ResourceManager, and Dispatcher. In this section, we explain how to configure and start Flink in different environmentsâincluding standalone clusters, Docker, Apache Hadoop YARN, and Kubernetesâand how Flinkâs components are assembled in each setup.
Standalone Cluster
A standalone Flink cluster consists of at least one master process and at least one TaskManager process that run on one or more machines. All processes run as regular Java JVM processes. Figure 9-1 shows a standalone Flink setup.
Get Stream Processing with Apache Flink now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.