How to Setup Gunicorn on FastAPI for Concurrent Invocation

Introduction

FastAPI is a modern, fast Python framework for building APIs. However, by default, the FastAPI development server (uvicorn) only runs a single worker process, which means it can only handle one request at a time. For production and handling concurrent requests, we need Gunicorn as a process manager.

Gunicorn (Green Unicorn) is an HTTP server for Python that can run multiple worker processes, allowing our FastAPI application to handle many requests simultaneously.

Why Gunicorn?

Advantages of using Gunicorn:

Multiple worker processes for concurrent request handling
Automatic load balancing between workers
Graceful restart without downtime
Production-ready and battle-tested
Compatible with uvicorn workers for async support

Installation

First, install the required dependencies:

pip install fastapi uvicorn gunicorn

Or add to requirements.txt:

fastapi==0.109.0
uvicorn[standard]==0.27.0
gunicorn==21.2.0

Then install:

pip install -r requirements.txt

Creating a Simple FastAPI Application

Create a main.py file:

from fastapi import FastAPI
import time
import os

app = FastAPI()

@app.get("/")
async def root():
    return {
        "message": "Hello World",
        "workerpid": os.getpid()

    }

@app.get("/slow")
async def slowendpoint():
    # Simulate time-consuming operation
    time.sleep(5)
    return {
        "message": "This took 5 seconds",
        "workerpid": os.getpid()

    }

@app.get("/health")
async def healthcheck():
    return {"status": "healthy"}

Setting Up Gunicorn with Uvicorn Workers

Gunicorn itself is a synchronous server. To get async benefits from FastAPI, we use Uvicorn workers.

Running from Command Line

gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Parameter explanation:

main:app - module:application (main.py with app variable)
--workers 4 - number of worker processes
--worker-class uvicorn.workers.UvicornWorker - use async uvicorn workers
--bind 0.0.0.0:8000 - host and port

Creating a Configuration File

Create a gunicornconf.py file for more structured configuration:

import multiprocessing
import os

Server Socket
bind = "0.0.0.0:8000"
backlog = 2048

Worker Processes
workers = int(os.getenv("WORKERS", multiprocessing.cpucount()  2 + 1))

workerclass = "uvicorn.workers.UvicornWorker"

workerconnections = 1000
maxrequests = 10000

maxrequestsjitter = 1000

timeout = 120
keepalive = 5

Logging
accesslog = "-"
errorlog = "-"
loglevel = "info"
accesslogformat = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'


Process Naming
procname = "fastapiapp"


Server Mechanics
daemon = False
pidfile = None
user = None
group = None
tmpuploaddir = None


Graceful Timeout
gracefultimeout = 30

Run with config file:

gunicorn main:app -c gunicornconf.py

Determining Optimal Number of Workers

General formula for determining number of workers:

workers = (2  CPUCORES) + 1

Examples:

2 CPU cores = 5 workers
4 CPU cores = 9 workers
8 CPU cores = 17 workers

Tips:

For CPU-bound tasks: use the formula above
For I/O-bound tasks: can use more workers
Monitor memory usage, don't run out of RAM
Start with the standard formula, then adjust based on monitoring

Testing Concurrent Requests

Test whether concurrent invocation works correctly.

Using Python

Create a testconcurrent.py file:

import asyncio import aiohttp import time

How to Setup Gunicorn on FastAPI for Concurrent Invocation

Introduction

Why Gunicorn?

Installation

Creating a Simple FastAPI Application

Setting Up Gunicorn with Uvicorn Workers

Running from Command Line

Creating a Configuration File

Server Socket

Worker Processes

Logging

Process Naming

Server Mechanics

Graceful Timeout

Determining Optimal Number of Workers

Testing Concurrent Requests

Using Python

Related Articles

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

SQLModel: Modern Python ORM for Type-Safe AI Applications

Complete FastAPI for Machine Learning Tutorial: Building Production ML APIs

Complete MongoDB Tutorial: NoSQL Database for Modern Applications

Related Articles

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

Reflex: Membangun Aplikasi Web Full-Stack dengan Python Murni Reflex memungkinkan Anda membangun aplikasi web lengkap — ...

SQLModel: Modern Python ORM for Type-Safe AI Applications

SQLModel: ORM Modern Python untuk Aplikasi AI yang Type-Safe Dalam pengembangan aplikasi AI/ML, pengelolaan data di data...

Complete FastAPI for Machine Learning Tutorial: Building Production ML APIs

Tutorial Lengkap FastAPI untuk ML: Build Production ML APIs FastAPI adalah framework web Python modern dengan performa t...

Complete MongoDB Tutorial: NoSQL Database for Modern Applications

Tutorial Lengkap MongoDB: Database NoSQL untuk Aplikasi Modern MongoDB adalah database NoSQL document-oriented yang sang...