Getting Started with Kafka

Published by Pearson

Beginner to intermediate

Building Effective Data Pipelines

Get hands-on experience with Kafka in just four hours
Use Python with Kafka to create an end-to-end data flows
Inspect Kafka data flow examples with a GUI
Take away a complete copy of the instructor's notes, example code, virtual machine, and class slides to refer to after class

The generation and movement of big data is never constant. In many cases, organizational data flows start with a simple and direct end-to-end connection. While this basic connection model seems manageable, adding more data sources and destinations can easily create an unmaintainable morass of applications and data flows.

Apache Kafka is designed to manage data flow by decoupling the data source from the destination. By placing Kafka in the middle of organizational data flows, Kafka can provide a robust data buffer or broker that can help create and manage data pipelines.

In this training, the basic Kafka data broker design and operation is explained and illustrated using both the command line and a GUI. More advanced examples that include streaming weather and image data for analysis and storage are demonstrated using downloadable virtual machine.

What you’ll learn and how you can apply it

By the end of the live online course, you’ll understand:

The design and components of the Apache Kafka data broker
How Kafka manages dataflows using brokers
How to configure, create topics, and use Kafka as a data broker
How to write and use Kafka consumers and producers in Python
How to use Python and Kafka to stream open weather data
How to use Python and Kafka to stream, store, and analyze images in real-time

And you’ll be able to:

Understand the benefits and how to use Kafka
Create basic Kafka producers and consumers
Write Python applications to work directly with Kafka
Inspect Kafka data flows in real-time with a GUI

This live event is for you because...

You want to understand and visualize Apache Kafka and data streaming
You want to learn the basics of building data pipelines with Kafka
Hands-on experience is important to you when learning a new technology
You want a working development environment for use after the training

Prerequisites

The hands-on portion of the course is done using the Linux command line. The course assumes familiarity using the command line on a modern Linux server.
Please be aware, if you have no experience with the Linux command line, you may find this course difficult to follow at times. See Recommended Preparation if you need a refresher.

Course Set-up

To run the class examples, a Linux Hadoop Minimal Virtual Machine (VM) is available. The VM is a full Linux installation that can run on your laptop/desktop using VirtualBox (freely available). The VM provides a functional Kafka and Python environment to continue learning after the class (in addition to Hadoop, HBase, Hive, and Spark).

Further information on the class, access to the class notes, and the Linux Hadoop Minimal VM can be found here.

Recommended Preparation

Watch: Linux Command Line Complete Video Course by Susan Lauber

Recommended Follow-up

Read: Kafka: The Definitive Guide 2e by Shapira, et al
Attend: Apache Hadoop, Spark, and Kafka Foundations: Effective Data Pipelines by Doug Eadline
Watch: Kafka Essentials LiveLessons: A Quick-Start for Building Effective Data Pipelines by Douglas Eadline

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Total workshop time is 4 hours. There will time for questions between segments. Emphasis will be placed on making sure all questions are addressed.

Segment 1: Introduction and Course Goals (15 mins)

Class Resources and web page
How to get the most out of this course
Required prerequisite skills
Using the Linux Hadoop Minimal virtual machine

Segment 2: Why Do I need a Message Broker? (20 mins)

Managing data growth
Decoupling acquisition from use
Reliability and scalability
Kafka Use cases

Segment 3: Kafka Components (20 mins)

Producers and consumers
Brokers, partitions, and clusters
Question and Answers (5 mins)

Break (10 mins)

Segment 4: Basic Examples (35 mins)

Sending messages with producers
Reading messages with consumers
Question and Answers (5 mins)

Segment 5: Using a Kafka UI (25 mins)

Using the KafkaEsque Features
Replay Basic Examples with KafkaEsque
Question and Answers (5 mins)

Segment 6: Example One: Streaming Weather Data (35 mins)

Component Background: Kafka, Python, noaa-sdk
Using NOAA Data Source with Python
Python Producer (NOAA data acquisition)
Python Consumer (Data Storage and analysis)
Real-time demonstration
Question and Answers (5 mins)

Break (10 mins)

Segment 7: Example Two: Image Streaming with Kafka (40 mins)

Component Background: Kafka, Python, Bash
Configuring Image Streaming to and from Kafka
Python Producer (image capture)
Python Consumer (image analysis)
Real-time demonstration

Segment 8: Course Wrap-up, Questions, and Additional Resources (10 mins)

Your Instructor

Douglas Eadline
Douglas Eadline began his career as an Analytical Chemist with an interest in computer methods. Starting with the first Beowulf how-to document, Doug has written instructional documents covering many aspects of Linux HPC (High Performance Computing) and Scalable Data Analytics (Hadoop/Spark) computing. Currently, Doug serves as editor of the ClusterMonkey.net website and was previously editor of ClusterWorld Magazine, and senior HPC editor for Linux Magazine. He is also an active writer and consultant to the HPC/Analytics industry. His recent video tutorials and books include Hadoop and Spark Fundamentals LiveLessons (Addison Wesley), Hadoop 2 Quick Start Guide (Addison Wesley), High Performance Computing for Dummies (Wiley) and Practical Data Science with Hadoop and Spark (coauthor, Addison Wesley).

link search