Tutorial 16: Celery + Redis for ML Task Queues
Table of Contents
Introduction
Machine learning inference can range from milliseconds for a simple scikit-learn model to minutes for a complex deep learning pipeline involving image processing, feature extraction, and ensemble predictions. When inference takes more than a few hundred milliseconds, handling it synchronously within a web request creates poor user experience and risks request timeouts.
Task queues solve this by offloading long-running work to background workers. The client submits a task, receives an ID immediately, and polls for results later. This pattern enables your API to remain responsive, supports horizontal scaling of compute-intensive workloads, and provides built-in retry logic for failed tasks.
Celery is the most widely used distributed task queue for Python, and Redis is its most popular message broker. Together, they form a battle-tested foundation for asynchronous ML inference, batch processing, model training jobs, and data pipeline orchestration.
This tutorial walks you through building a complete async ML inference system with Celery, Redis, and FastAPI — from basic setup to production-ready deployment.
Prerequisites
- Python 3.9 or higher
- Redis server (local or Docker)
- Basic understanding of REST APIs and async programming
- Install required packages:
pip install celery[redis] redis fastapi uvicorn scikit-learn joblib numpy pandas flower pydantic
Start Redis with Docker:
docker run -d --name redis -p 6379:6379 redis:7-alpine
Understanding Task Queues for ML
Architecture Overview
The Celery architecture consists of three main components:
Producer (Client): Your web application (e.g., FastAPI) that submits tasks. It sends task messages to the broker and returns immediately, giving the caller a task ID for later retrieval. Broker (Redis): The message queue that holds tasks until workers pick them up. Redis acts as both the broker (task queue) and the result backend (where completed task results are stored). Worker: A separate process (or set of processes) that consumes tasks from the broker, executes them, and stores the results. For ML, each worker loads the model into memory and processes inference requests.Client (FastAPI) --> Redis Broker --> Worker 1 (ML Model)
--> Worker 2 (ML Model)
--> Worker 3 (ML Model)
|
Redis Result Backend
|
Client polls for result
When to Use Task Queues for ML
| Use Case | Sync API | Task Queue |
|----------|----------|------------|
| Simple model, < 100ms inference | Preferred | Overkill |
| Complex pipeline, 1-30s inference | Risky (timeouts) | Preferred |
| Batch predictions (1000+ items) | Not feasible | Required |
| GPU inference with queuing | Impractical | Ideal |
| Model training triggers | Not possible | Required |
Setting Up Celery with Redis
Project Structure
ml-celery-project/
celeryapp.py # Celery application configuration
tasks.py # Task definitions
api.py # FastAPI application
ml/
model.py # Model loading and inference
preprocessing.py # Feature preprocessing
models/