Hydra: Elegant Configuration Management for ML and Python Applications
Most machine learning projects start with a single training script and a handful of hyperparameters passed through argparse. A few months later that same script has thirty flags, hardcoded paths scattered across functions, and nobody can reproduce the run that produced last quarter's best model. Hydra, an open-source framework from Meta built on top of OmegaConf, is designed to solve exactly this class of problem. This tutorial walks through Hydra from first principles using a realistic model training workflow as the running example.
What Hydra Is and What It Solves
Hydra is a Python framework for composing and overriding hierarchical configuration. Instead of one giant config file or a wall of command-line flags, you describe your configuration as a tree of small YAML files that Hydra composes at runtime. You can then override any value from the command line, swap entire config groups (for example switch from a ResNet to a Vision Transformer), and run sweeps across combinations of parameters.
The concrete problems it addresses in ML work:
- Scattered
argparsedefinitions. Flags accumulate across files and become hard to discover or document. - Hardcoded hyperparameters. Learning rates and paths buried in source code make experiments brittle.
- Hard-to-reproduce runs. Without a saved snapshot of the exact configuration, reproducing a result months later is guesswork.
- Combinatorial experiment configs. Trying three models against four datasets with two optimizers is twenty-four runs; managing those by hand is error-prone.
Hydra sits on OmegaConf, which provides the underlying typed configuration object (a DictConfig), variable interpolation, and merging logic. Hydra adds composition, command-line overrides, automatic output directories, and multirun.
Installation
Hydra is published as hydra-core on PyPI. Install it into your project environment:
pip install hydra-core
Optional plugins for sweepers and launchers are separate packages, installed when you need them:
pip install hydra-optuna-sweeper
pip install hydra-joblib-launcher
pip install hydra-submitit-launcher
Verify the installation:
python -c "import hydra; print(hydra.version)"
This tutorial targets Hydra 1.3.x, which is the current stable line.
The Basics: Entry Point, Config File, and cfg
A Hydra application has three pieces: a config directory, a YAML config file, and a decorated entry-point function. Create a project layout like this:
project/
├── train.py
└── conf/
└── config.yaml
Start with a minimal conf/config.yaml:
epochs: 10
batchsize: 32
optimizer:
name: adam
lr: 0.001
Now the entry point in train.py:
import hydra
from omegaconf import DictConfig, OmegaConf
@hydra.main(versionbase=None, configpath="conf", configname="config")
def main(cfg: DictConfig) -> None:
print(OmegaConf.toyaml(cfg))
print(f"Training for {cfg.epochs} epochs")
print(f"Optimizer: {cfg.optimizer.name} @ lr={cfg.optimizer.lr}")
if name == "main":
main()
A few notes on the decorator. versionbase=None opts into the modern behavior and silences a compatibility warning; set it explicitly so your script behaves consistently across Hydra versions. configpath is relative to the file that defines main, and configname is the YAML filename without its extension.
Run it:
python train.py
The cfg object is an OmegaConf DictConfig. You access values with dot notation (cfg.optimizer.lr) or dictionary syntax (cfg["optimizer"]["lr"]). It behaves like a nested namespace, and OmegaConf preserves the types declared in YAML, so cfg.epochs is an int and cfg.optimizer.lr is a float.
Command-Line Overrides
The single most useful Hydra feature day to day is overriding configuration without touching files. Any leaf value can be set from the command line using dotted paths: