Tutorial 13: Apache Kafka for Real-Time ML Pipelines
Table of Contents
Introduction
In modern machine learning systems, the ability to process data and generate predictions in real time is a critical differentiator. Batch inference pipelines that run on hourly or daily schedules simply cannot meet the requirements of fraud detection, recommendation engines, dynamic pricing, or anomaly detection systems that demand sub-second response times.
Apache Kafka is a distributed event streaming platform that enables you to build robust, scalable, real-time data pipelines. When integrated with ML models, Kafka allows you to ingest streaming data, compute features on the fly, serve predictions in real time, and feed results back into downstream systems — all with high throughput and fault tolerance.
This tutorial walks you through the complete journey: from Kafka fundamentals to building a production-grade real-time ML inference pipeline using Python.
Prerequisites
- Python 3.9 or higher
- Docker and Docker Compose installed
- Basic understanding of machine learning concepts
- Familiarity with REST APIs and JSON
- Install required Python packages:
pip install confluent-kafka fastavro requests scikit-learn numpy pandas fastapi uvicorn
Understanding Apache Kafka
Core Concepts
Apache Kafka operates around several fundamental abstractions:
Topics are named feeds or categories to which records are published. Think of a topic as a database table or a folder in a filesystem. Each topic is split into partitions, which are ordered, immutable sequences of records. Partitions enable parallelism — multiple consumers can read from different partitions simultaneously. Producers are client applications that publish (write) events to Kafka topics. Consumers are applications that subscribe to (read and process) events from topics. Consumers are organized into consumer groups, where each partition is consumed by exactly one consumer within a group, enabling load balancing. Brokers are the Kafka servers that store data and serve clients. A Kafka cluster consists of multiple brokers for redundancy and scalability. Offsets are unique sequential IDs assigned to each record within a partition. Consumers track their position using offsets, enabling exactly-once or at-least-once processing guarantees.Why Kafka for ML?
| Feature | Benefit for ML |
|---------|---------------|
| High throughput | Handle millions of prediction requests per second |
| Durability | Never lose input data even if ML service goes down |
| Decoupling | Separate data ingestion from model inference |
| Replayability | Reprocess historical data with updated models |
| Scalability | Add more consumers/partitions as load increases |
Setting Up Kafka Locally
Create a docker-compose.yml file for a complete Kafka stack:
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPERCLIENTPORT: 2181
ZOOKEEPERTICKTIME: 2000
ports:
- "2181:2181"
kafka:
image: confluentinc/cp-kafka:7.5.0
dependson:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKABROKERID: 1
KAFKAZOOKEEPERCONNECT: zookeeper:2181
KAFKAADVERTISEDLISTENERS: PLAINTEXT://localhost:9092
KAFKAOFFSETSTOPICREPLICATIONFACTOR: 1