Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

# Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseorang menyalin sebuah skrip pelatihan, mengubah belasan nilai yang ditulis langs...

By Ruby Abdullah · · tutorial
AxolotlLLMFine-TuningQLoRAMulti-GPUPython

Configuration-Driven LLM Fine-Tuning with Axolotl

Most fine-tuning projects start the same way: someone copies a training script, edits a dozen hard-coded values, and hopes the next person can reproduce the run. Axolotl takes a different stance. It treats the entire fine-tuning job as a single YAML file, so the model, the dataset format, the LoRA settings, and the multi-GPU strategy all live in one versionable artifact. This tutorial walks through Axolotl end to end: what it actually wraps, how to install it, how to read and write the config field by field, and how to run a realistic QLoRA fine-tune of an 8B instruct model on one or two GPUs.

What Axolotl Is

Axolotl is not a new training framework. It is an opinionated wrapper that sits on top of the Hugging Face ecosystem and orchestrates the pieces you would otherwise wire together by hand:

  • Transformers for model and tokenizer loading.
  • PEFT for LoRA and QLoRA adapters.
  • TRL for the supervised fine-tuning (SFT) and preference-optimization trainers.
  • bitsandbytes for 4-bit and 8-bit quantization.
  • Accelerate, DeepSpeed, and FSDP for distributed and multi-GPU training.

The central idea is that you describe what you want rather than writing the glue code that does it. A single YAML config declares the base model, how your dataset should be parsed, which adapter to attach, the optimizer schedule, and the distributed backend. Axolotl reads that file, builds the right objects, and launches the run. The same file is the thing you commit to git, hand to a colleague, or attach to an experiment in Weights & Biases.

When to Choose Axolotl

Axolotl is a good fit when one or more of these is true:

  • You have more than one GPU. Axolotl's integration with DeepSpeed ZeRO and FSDP is first-class, and switching from one GPU to eight is mostly a launcher change, not a code rewrite.
  • You work across many model families. Llama, Mistral, Mixtral, Qwen, Gemma, Phi, and others are supported through the same config surface, so you are not relearning a new API per model.
  • Reproducibility matters. Because the run is fully described by a config file, "rerun exactly what we did last month" becomes a git checkout plus one command.
  • You want recipes, not scripts. Axolotl ships dozens of example configs you can copy and adjust.

If your situation is the opposite — a single consumer GPU where raw training speed and minimal VRAM are the priority — a kernel-optimized library such as Unsloth may train faster. Axolotl's strength is breadth, distributed scaling, and reproducibility rather than squeezing the last token-per-second out of one card. The two are not competitors so much as tools for different jobs.

Installation

Axolotl depends on a CUDA-enabled PyTorch build, and the most common source of trouble is a mismatch between PyTorch, the CUDA toolkit, and flash-attn. Install PyTorch first, matched to your driver, then install Axolotl.

# 1. Create an isolated environment

python -m venv .venv

source .venv/bin/activate

2. Install a CUDA-matched PyTorch (example: CUDA 12.1)

pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121

3. Install Axolotl with the flash-attention and deepspeed extras

pip install "axolotl[flash-attn,deepspeed]"

If your environment is fragile or you simply want something that works on the first try, the official Docker image bundles a known-good combination of CUDA, PyTorch, and the optimized kernels:

docker run --gpus all --rm -it \

-v "$(pwd)":/workspace \

axolotlai/axolotl:main-latest \

bash

After installation, verify the CLI is on your path:

axolotl --help

The Role of Accelerate

Axolotl uses Hugging Face Accelerate to abstract the device placement and distributed launch. For a single GPU you usually do not need to touch it, but for multi-GPU runs Accelerate decides how processes are spawned. You can generate a default config once with accelerate config, but in practice most Axolotl users let the DeepSpeed or FSDP settings in the YAML drive the behavior and launch with accelerate launch. We return to this in the training section.

Related Articles

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

Fine-Tuning LLM Secara Efisien dengan Unsloth Dahulu, melakukan fine-tuning model bahasa besar membutuhkan server multi-...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

Membangun Agen LLM yang Type-Safe dengan PydanticAI PydanticAI adalah framework agen dari tim di balik Pydantic, diranca...

AutoGen: Microsoft's Multi-Agent Conversation Framework

AutoGen: Framework Multi-Agent Conversation dari Microsoft AutoGen adalah framework open-source dari Microsoft Research ...