Complete Tutorial: Fine-tuning LLMs with LoRA and PEFT

Fine-tuning Large Language Models (LLMs) traditionally requires massive GPU memory and long training times. LoRA (Low-Rank Adaptation) and PEFT (Parameter-Efficient Fine-Tuning) enable you to fine-tune with significantly fewer resources while maintaining competitive performance.

What are LoRA and PEFT?

LoRA (Low-Rank Adaptation)

LoRA is a technique that adds trainable low-rank matrices to frozen model layers. Instead of updating all model parameters:

Freeze all original model weights
Inject trainable rank decomposition matrices (A and B)
Only train these new matrices (< 1% of total parameters)

Benefits:

Memory efficient: ~10x smaller than full fine-tuning
Faster training: Training is faster
No inference latency: Weights can be merged
Task switching: Swap LoRA adapters for different tasks

PEFT (Parameter-Efficient Fine-Tuning)

PEFT is a library from Hugging Face that provides various efficient fine-tuning methods:

LoRA: Low-rank adaptation
QLoRA: LoRA with quantization
Prefix Tuning: Prepend trainable tokens
Prompt Tuning: Learn soft prompts
IA3: Infused Adapter by Inhibiting and Amplifying

Installation

# Install packages
pip install transformers datasets accelerate peft bitsandbytes
pip install trl  # For training utilities
pip install wandb  # Optional: experiment tracking

For QLoRA (4-bit quantization)
pip install bitsandbytes>=0.41.0

Verify GPU
python -c "import torch; print(torch.cuda.isavailable())"

Quick Start: Fine-tune with LoRA

1. Basic LoRA Fine-tuning

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, getpeftmodel, TaskType

from datasets import loaddataset
from trl import SFTTrainer

Load base model and tokenizer
modelname = "meta-llama/Llama-2-7b-hf"  # Or another model

tokenizer = AutoTokenizer.frompretrained(modelname)

tokenizer.padtoken = tokenizer.eostoken


model = AutoModelForCausalLM.frompretrained(
    modelname,

    torchdtype="auto",
    devicemap="auto"

)

LoRA configuration
loraconfig = LoraConfig(
    r=8,  # Rank of the update matrices
    loraalpha=32,  # Scaling factor

    targetmodules=["qproj", "vproj"],  # Modules to apply LoRA
    loradropout=0.05,

    bias="none",
    tasktype=TaskType.CAUSALLM

)

Apply LoRA to model
model = getpeftmodel(model, loraconfig)
model.printtrainableparameters()
Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06%

Load dataset
dataset = loaddataset("timdettmers/openassistant-guanaco", split="train")


Training arguments
trainingargs = TrainingArguments(
    outputdir="./lora-output",

    numtrainepochs=3,

    perdevicetrainbatchsize=4,

    gradientaccumulationsteps=4,

    learningrate=2e-4,
    fp16=True,
    loggingsteps=10,

    savestrategy="epoch",
    warmupratio=0.03,

)

Trainer
trainer = SFTTrainer(
    model=model,
    args=trainingargs,
    traindataset=dataset,

    tokenizer=tokenizer,
    datasettextfield="text",

    maxseqlength=512,

)

Train
trainer.train()

Save LoRA weights
model.savepretrained("./lora-adapter")

2. LoRA Config Parameters

from peft import LoraConfig

config = LoraConfig(
    # Core parameters
    r=8,  # Rank: dimension of low-rank matrices
          # Higher = more expressive but more params
          # Typical: 4, 8, 16, 32, 64

    loraalpha=32,  # Scaling factor

                    # Rule: alpha = 2  r for optimal results

                    # Scaling = alpha / r

    # Target modules (varies by model architecture)
    targetmodules=[
        "qproj",  # Query projection

Complete Tutorial: Fine-tuning LLMs with LoRA and PEFT

Complete Tutorial: Fine-tuning LLMs with LoRA and PEFT

What are LoRA and PEFT?

LoRA (Low-Rank Adaptation)

PEFT (Parameter-Efficient Fine-Tuning)

Installation

For QLoRA (4-bit quantization)

Verify GPU

Quick Start: Fine-tune with LoRA

1. Basic LoRA Fine-tuning

Load base model and tokenizer

LoRA configuration

Apply LoRA to model

Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06%

Load dataset

Training arguments

Trainer

Train

Save LoRA weights

2. LoRA Config Parameters

Related Articles

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Complete Hugging Face Transformers Tutorial: Modern NLP with Python

Related Articles

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

Fine-Tuning LLM Secara Efisien dengan Unsloth Dahulu, melakukan fine-tuning model bahasa besar membutuhkan server multi-...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseora...

Complete Hugging Face Transformers Tutorial: Modern NLP with Python

Tutorial Lengkap Hugging Face Transformers: Pretrained Models untuk NLP dan Vision Hugging Face Transformers adalah libr...