Introduction
FastAPI is a modern, fast Python framework for building APIs. However, by default, the FastAPI development server (uvicorn) only runs a single worker process, which means it can only handle one request at a time. For production and handling concurrent requests, we need Gunicorn as a process manager.
Gunicorn (Green Unicorn) is an HTTP server for Python that can run multiple worker processes, allowing our FastAPI application to handle many requests simultaneously.
Why Gunicorn?
Advantages of using Gunicorn:- Multiple worker processes for concurrent request handling
- Automatic load balancing between workers
- Graceful restart without downtime
- Production-ready and battle-tested
- Compatible with uvicorn workers for async support
Installation
First, install the required dependencies:
pip install fastapi uvicorn gunicorn
Or add to requirements.txt:
fastapi==0.109.0
uvicorn[standard]==0.27.0
gunicorn==21.2.0
Then install:
pip install -r requirements.txt
Creating a Simple FastAPI Application
Create a main.py file:
from fastapi import FastAPI
import time
import os
app = FastAPI()
@app.get("/")
async def root():
return {
"message": "Hello World",
"workerpid": os.getpid()
}
@app.get("/slow")
async def slowendpoint():
# Simulate time-consuming operation
time.sleep(5)
return {
"message": "This took 5 seconds",
"workerpid": os.getpid()
}
@app.get("/health")
async def healthcheck():
return {"status": "healthy"}
Setting Up Gunicorn with Uvicorn Workers
Gunicorn itself is a synchronous server. To get async benefits from FastAPI, we use Uvicorn workers.
Running from Command Line
gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
Parameter explanation:
main:app- module:application (main.py with app variable)--workers 4- number of worker processes--worker-class uvicorn.workers.UvicornWorker- use async uvicorn workers--bind 0.0.0.0:8000- host and port
Creating a Configuration File
Create a gunicornconf.py file for more structured configuration:
import multiprocessing
import os
Server Socket
bind = "0.0.0.0:8000"
backlog = 2048
Worker Processes
workers = int(os.getenv("WORKERS", multiprocessing.cpucount() 2 + 1))
workerclass = "uvicorn.workers.UvicornWorker"
workerconnections = 1000
maxrequests = 10000
maxrequestsjitter = 1000
timeout = 120
keepalive = 5
Logging
accesslog = "-"
errorlog = "-"
loglevel = "info"
accesslogformat = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'
Process Naming
procname = "fastapiapp"
Server Mechanics
daemon = False
pidfile = None
user = None
group = None
tmpuploaddir = None
Graceful Timeout
gracefultimeout = 30
Run with config file:
gunicorn main:app -c gunicornconf.py
Determining Optimal Number of Workers
General formula for determining number of workers:
workers = (2 CPUCORES) + 1
Examples:
- 2 CPU cores = 5 workers
- 4 CPU cores = 9 workers
- 8 CPU cores = 17 workers
- For CPU-bound tasks: use the formula above
- For I/O-bound tasks: can use more workers
- Monitor memory usage, don't run out of RAM
- Start with the standard formula, then adjust based on monitoring
Testing Concurrent Requests
Test whether concurrent invocation works correctly.
Using Python
Create a testconcurrent.py file:
import asyncio
import aiohttp
import time