Hydra Tutorial: Elegant Configuration Management for ML

# Hydra: Manajemen Konfigurasi yang Rapi untuk Aplikasi ML dan Python Kebanyakan proyek machine learning dimulai dari satu skrip training dan beberapa hyperparameter yang dilewatkan lewat `argparse`....

By Ruby Abdullah · · tutorial
HydraConfigurationMLOpsExperiment ManagementOmegaConfPython

Hydra: Elegant Configuration Management for ML and Python Applications

Most machine learning projects start with a single training script and a handful of hyperparameters passed through argparse. A few months later that same script has thirty flags, hardcoded paths scattered across functions, and nobody can reproduce the run that produced last quarter's best model. Hydra, an open-source framework from Meta built on top of OmegaConf, is designed to solve exactly this class of problem. This tutorial walks through Hydra from first principles using a realistic model training workflow as the running example.

What Hydra Is and What It Solves

Hydra is a Python framework for composing and overriding hierarchical configuration. Instead of one giant config file or a wall of command-line flags, you describe your configuration as a tree of small YAML files that Hydra composes at runtime. You can then override any value from the command line, swap entire config groups (for example switch from a ResNet to a Vision Transformer), and run sweeps across combinations of parameters.

The concrete problems it addresses in ML work:

  • Scattered argparse definitions. Flags accumulate across files and become hard to discover or document.
  • Hardcoded hyperparameters. Learning rates and paths buried in source code make experiments brittle.
  • Hard-to-reproduce runs. Without a saved snapshot of the exact configuration, reproducing a result months later is guesswork.
  • Combinatorial experiment configs. Trying three models against four datasets with two optimizers is twenty-four runs; managing those by hand is error-prone.

Hydra sits on OmegaConf, which provides the underlying typed configuration object (a DictConfig), variable interpolation, and merging logic. Hydra adds composition, command-line overrides, automatic output directories, and multirun.

Installation

Hydra is published as hydra-core on PyPI. Install it into your project environment:

pip install hydra-core

Optional plugins for sweepers and launchers are separate packages, installed when you need them:

pip install hydra-optuna-sweeper

pip install hydra-joblib-launcher

pip install hydra-submitit-launcher

Verify the installation:

python -c "import hydra; print(hydra.version)"

This tutorial targets Hydra 1.3.x, which is the current stable line.

The Basics: Entry Point, Config File, and cfg

A Hydra application has three pieces: a config directory, a YAML config file, and a decorated entry-point function. Create a project layout like this:

project/

├── train.py

└── conf/

└── config.yaml

Start with a minimal conf/config.yaml:

epochs: 10

batchsize: 32

optimizer:

name: adam

lr: 0.001

Now the entry point in train.py:

import hydra

from omegaconf import DictConfig, OmegaConf

@hydra.main(versionbase=None, configpath="conf", configname="config")

def main(cfg: DictConfig) -> None:

print(OmegaConf.toyaml(cfg))

print(f"Training for {cfg.epochs} epochs")

print(f"Optimizer: {cfg.optimizer.name} @ lr={cfg.optimizer.lr}")

if name == "main":

main()

A few notes on the decorator. versionbase=None opts into the modern behavior and silences a compatibility warning; set it explicitly so your script behaves consistently across Hydra versions. configpath is relative to the file that defines main, and configname is the YAML filename without its extension.

Run it:

python train.py

The cfg object is an OmegaConf DictConfig. You access values with dot notation (cfg.optimizer.lr) or dictionary syntax (cfg["optimizer"]["lr"]). It behaves like a nested namespace, and OmegaConf preserves the types declared in YAML, so cfg.epochs is an int and cfg.optimizer.lr is a float.

Command-Line Overrides

The single most useful Hydra feature day to day is overriding configuration without touching files. Any leaf value can be set from the command line using dotted paths:

Related Articles

Metaflow Tutorial: Netflix's MLOps Framework for Data Science

Tutorial Metaflow: Framework MLOps dari Netflix untuk Data Science Metaflow adalah framework open-source yang dikembangk...

Kedro Tutorial: Reproducible and Maintainable Data Science Pipelines

Kedro: Pipeline Data Science yang Reproducible dan Mudah Dirawat Sebagian besar proyek data science dimulai dari satu no...

Ray Train & Ray Tune Tutorial: Distributed Training and Hyperparameter Tuning

Ray Train & Ray Tune: Pelatihan Terdistribusi dan Penyetelan Hiperparameter Sebagian besar proyek machine learning dimul...

ZenML: Modular and Cloud-Agnostic MLOps Pipeline Framework

ZenML: Framework Pipeline MLOps yang Modular dan Cloud-Agnostic Pendahuluan Membangun model machine learning yang akurat...