Tutorial 16: Celery + Redis for ML Task Queues

Introduction

Prerequisites

Understanding Task Queues for ML

Setting Up Celery with Redis

Creating Async ML Inference Tasks

Task Chains, Groups, and Chords

Integration with FastAPI

Monitoring with Flower

Error Handling and Retries

Scaling Workers

Production Deployment

Best Practices

Conclusion

Introduction

Machine learning inference can range from milliseconds for a simple scikit-learn model to minutes for a complex deep learning pipeline involving image processing, feature extraction, and ensemble predictions. When inference takes more than a few hundred milliseconds, handling it synchronously within a web request creates poor user experience and risks request timeouts.

Task queues solve this by offloading long-running work to background workers. The client submits a task, receives an ID immediately, and polls for results later. This pattern enables your API to remain responsive, supports horizontal scaling of compute-intensive workloads, and provides built-in retry logic for failed tasks.

Celery is the most widely used distributed task queue for Python, and Redis is its most popular message broker. Together, they form a battle-tested foundation for asynchronous ML inference, batch processing, model training jobs, and data pipeline orchestration.

This tutorial walks you through building a complete async ML inference system with Celery, Redis, and FastAPI — from basic setup to production-ready deployment.

Prerequisites

Python 3.9 or higher
Redis server (local or Docker)
Basic understanding of REST APIs and async programming
Install required packages:

pip install celery[redis] redis fastapi uvicorn scikit-learn joblib numpy pandas flower pydantic

Start Redis with Docker:

docker run -d --name redis -p 6379:6379 redis:7-alpine

Understanding Task Queues for ML

Architecture Overview

The Celery architecture consists of three main components:

Producer (Client): Your web application (e.g., FastAPI) that submits tasks. It sends task messages to the broker and returns immediately, giving the caller a task ID for later retrieval. Broker (Redis): The message queue that holds tasks until workers pick them up. Redis acts as both the broker (task queue) and the result backend (where completed task results are stored). Worker: A separate process (or set of processes) that consumes tasks from the broker, executes them, and stores the results. For ML, each worker loads the model into memory and processes inference requests.

Client (FastAPI) --> Redis Broker --> Worker 1 (ML Model)
                                 --> Worker 2 (ML Model)
                                 --> Worker 3 (ML Model)
                                       |
                                 Redis Result Backend
                                       |
                                 Client polls for result

When to Use Task Queues for ML

| Use Case | Sync API | Task Queue |

|----------|----------|------------|

| Simple model, < 100ms inference | Preferred | Overkill |

| Complex pipeline, 1-30s inference | Risky (timeouts) | Preferred |

| Batch predictions (1000+ items) | Not feasible | Required |

| GPU inference with queuing | Impractical | Ideal |

| Model training triggers | Not possible | Required |

Setting Up Celery with Redis

Project Structure

ml-celery-project/
    celeryapp.py          # Celery application configuration

    tasks.py               # Task definitions
    api.py                 # FastAPI application
    ml/
        model.py           # Model loading and inference
        preprocessing.py   # Feature preprocessing
    models/

Celery + Redis for ML Task Queue Tutorial: Async Inference

Tutorial 16: Celery + Redis for ML Task Queues

Table of Contents

Introduction

Prerequisites

Understanding Task Queues for ML

Architecture Overview

When to Use Task Queues for ML

Setting Up Celery with Redis

Project Structure

Related Articles

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

SQLModel: Modern Python ORM for Type-Safe AI Applications

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

AWS Lambda + SageMaker Tutorial: Serverless ML Inference

Related Articles

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

Reflex: Membangun Aplikasi Web Full-Stack dengan Python Murni Reflex memungkinkan Anda membangun aplikasi web lengkap — ...

SQLModel: Modern Python ORM for Type-Safe AI Applications

SQLModel: ORM Modern Python untuk Aplikasi AI yang Type-Safe Dalam pengembangan aplikasi AI/ML, pengelolaan data di data...

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

Membangun Mesin Pencari Semantik dari Nol Daftar Isi Pendahuluan Prasyarat Memahami Pencarian Semantik [Text Embedding.....

AWS Lambda + SageMaker Tutorial: Serverless ML Inference

Tutorial Lengkap AWS Lambda + SageMaker: Serverless ML Inference Menggabungkan AWS Lambda dengan SageMaker memungkinkan ...