Celery + Redis for ML Task Queue Tutorial: Async Inference

# Tutorial 16: Celery + Redis untuk Antrian Tugas ML ## Daftar Isi 1. [Pendahuluan](#pendahuluan) 2. [Prasyarat](#prasyarat) 3. [Memahami Antrian Tugas untuk ML](#memahami-antrian-tugas-untuk-ml) 4....

By Ruby Abdullah · · tutorial
CeleryRedisTask QueueAsyncML InferenceFastAPI

Tutorial 16: Celery + Redis for ML Task Queues

Table of Contents

  • Introduction
  • Prerequisites
  • Understanding Task Queues for ML
  • Setting Up Celery with Redis
  • Creating Async ML Inference Tasks
  • Task Chains, Groups, and Chords
  • Integration with FastAPI
  • Monitoring with Flower
  • Error Handling and Retries
  • Scaling Workers
  • Production Deployment
  • Best Practices
  • Conclusion

  • Introduction

    Machine learning inference can range from milliseconds for a simple scikit-learn model to minutes for a complex deep learning pipeline involving image processing, feature extraction, and ensemble predictions. When inference takes more than a few hundred milliseconds, handling it synchronously within a web request creates poor user experience and risks request timeouts.

    Task queues solve this by offloading long-running work to background workers. The client submits a task, receives an ID immediately, and polls for results later. This pattern enables your API to remain responsive, supports horizontal scaling of compute-intensive workloads, and provides built-in retry logic for failed tasks.

    Celery is the most widely used distributed task queue for Python, and Redis is its most popular message broker. Together, they form a battle-tested foundation for asynchronous ML inference, batch processing, model training jobs, and data pipeline orchestration.

    This tutorial walks you through building a complete async ML inference system with Celery, Redis, and FastAPI — from basic setup to production-ready deployment.


    Prerequisites

    • Python 3.9 or higher
    • Redis server (local or Docker)
    • Basic understanding of REST APIs and async programming
    • Install required packages:

    pip install celery[redis] redis fastapi uvicorn scikit-learn joblib numpy pandas flower pydantic
    

    Start Redis with Docker:

    docker run -d --name redis -p 6379:6379 redis:7-alpine
    


    Understanding Task Queues for ML

    Architecture Overview

    The Celery architecture consists of three main components:

    Producer (Client): Your web application (e.g., FastAPI) that submits tasks. It sends task messages to the broker and returns immediately, giving the caller a task ID for later retrieval. Broker (Redis): The message queue that holds tasks until workers pick them up. Redis acts as both the broker (task queue) and the result backend (where completed task results are stored). Worker: A separate process (or set of processes) that consumes tasks from the broker, executes them, and stores the results. For ML, each worker loads the model into memory and processes inference requests.
    Client (FastAPI) --> Redis Broker --> Worker 1 (ML Model)
    

    --> Worker 2 (ML Model)

    --> Worker 3 (ML Model)

    |

    Redis Result Backend

    |

    Client polls for result

    When to Use Task Queues for ML

    | Use Case | Sync API | Task Queue |

    |----------|----------|------------|

    | Simple model, < 100ms inference | Preferred | Overkill |

    | Complex pipeline, 1-30s inference | Risky (timeouts) | Preferred |

    | Batch predictions (1000+ items) | Not feasible | Required |

    | GPU inference with queuing | Impractical | Ideal |

    | Model training triggers | Not possible | Required |


    Setting Up Celery with Redis

    Project Structure

    ml-celery-project/
    

    celeryapp.py # Celery application configuration

    tasks.py # Task definitions

    api.py # FastAPI application

    ml/

    model.py # Model loading and inference

    preprocessing.py # Feature preprocessing

    models/

    Related Articles

    Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

    Reflex: Membangun Aplikasi Web Full-Stack dengan Python Murni Reflex memungkinkan Anda membangun aplikasi web lengkap — ...

    SQLModel: Modern Python ORM for Type-Safe AI Applications

    SQLModel: ORM Modern Python untuk Aplikasi AI yang Type-Safe Dalam pengembangan aplikasi AI/ML, pengelolaan data di data...

    Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

    Membangun Mesin Pencari Semantik dari Nol Daftar Isi Pendahuluan Prasyarat Memahami Pencarian Semantik [Text Embedding.....

    AWS Lambda + SageMaker Tutorial: Serverless ML Inference

    Tutorial Lengkap AWS Lambda + SageMaker: Serverless ML Inference Menggabungkan AWS Lambda dengan SageMaker memungkinkan ...