Temporal with Python: Durable Execution for Reliable Workflows
Temporal is a platform for durable execution: it lets you write long-running, stateful business logic as ordinary code that survives process crashes, deployments, and infrastructure failures. Instead of stitching together queues, cron jobs, and a database to track "where did this order get to," you write a function and Temporal guarantees it runs to completion exactly as written, even if the machine running it dies halfway through.
What Durable Execution Means
Most workflow tools you may have seen — Airflow, Prefect, Dagster — are schedulers for data pipelines. They are excellent at running a DAG of tasks on a cadence and showing you the results. Celery is a task queue for offloading background jobs. Temporal solves a different problem: keeping a single long-lived process correct across failures.
The core idea is the event-sourced replay model:
- A Workflow is your business logic written as deterministic code. Temporal does not keep the workflow's variables in memory forever. Instead, every meaningful step (an activity result, a timer firing, a signal arriving) is appended to a durable event history.
- If the worker process crashes, Temporal starts the workflow again on another worker and replays the event history. Each line of your code re-executes, but instead of calling activities again, the SDK feeds back the recorded results. When replay catches up to where the crash happened, execution continues normally.
- Because of replay, workflow code must be deterministic: given the same history, it must take the same path every time. Anything non-deterministic (network calls, random numbers, reading the clock, querying a database) must happen inside an Activity.
An Activity is a plain function for side effects and non-deterministic work. Activities are not replayed; their results are recorded once and reused. They are the place where your code talks to the outside world.
This split is what makes the magic work: the workflow is the durable, replayable "brain," and activities are the disposable "hands" that touch external systems.
Core Architecture
A Temporal deployment has a few moving parts:
- Temporal Server (Cluster): the backend that stores event histories, schedules tasks, and enforces timeouts and retries. It is the source of truth, backed by a database (PostgreSQL, MySQL, or Cassandra).
- Task Queues: named queues the server uses to hand work to your code. Workers poll a task queue; clients and the server route workflow and activity tasks onto it.
- Workers: processes you run that host your workflow and activity code. A worker polls a task queue, executes tasks, and reports results back to the server. Your code lives here, not on the server.
- Client: how application code starts workflows, sends signals, and queries state.
- Web UI: a dashboard to inspect running and completed workflows, their event histories, inputs, outputs, and failures.
The server never runs your code. It only orchestrates. This separation means you can deploy new worker versions, scale workers horizontally, and the server keeps the histories safe.
Getting a Development Server
For local development, the Temporal CLI ships a self-contained dev server with an in-memory database and the Web UI.
# Install the Temporal CLI (macOS / Linux)
curl -sSf https://temporal.download/cli.sh | sh
Or with Homebrew
brew install temporal
Start a local dev server with the Web UI on http://localhost:8233
temporal server start-dev
The dev server listens for SDK connections on localhost:7233 and serves the Web UI on localhost:8233. It resets state on restart, which is fine for development. For production you run a real cluster or use Temporal Cloud (covered later).
Install the Python SDK:
pip install temporalio
A Coherent Example: Order Fulfillment
Throughout this tutorial we build one workflow: processing a customer order. The steps are charge the payment, reserve inventory, ship the package, and notify the customer. Each step can fail and should be retried. We will later add a durable timer, a signal to cancel, and a query to check status.