Complete Tutorial: Fine-tuning LLMs with LoRA and PEFT

# Tutorial Lengkap Fine-tuning LLM dengan LoRA dan PEFT Fine-tuning Large Language Models (LLM) secara tradisional membutuhkan GPU memory yang sangat besar dan waktu yang lama. LoRA (Low-Rank Adaptat...

By Ruby Abdullah · · tutorial
LLMLoRAPEFTFine-tuningHugging FaceDeep Learning

Complete Tutorial: Fine-tuning LLMs with LoRA and PEFT

Fine-tuning Large Language Models (LLMs) traditionally requires massive GPU memory and long training times. LoRA (Low-Rank Adaptation) and PEFT (Parameter-Efficient Fine-Tuning) enable you to fine-tune with significantly fewer resources while maintaining competitive performance.

What are LoRA and PEFT?

LoRA (Low-Rank Adaptation)

LoRA is a technique that adds trainable low-rank matrices to frozen model layers. Instead of updating all model parameters:

  • Freeze all original model weights
  • Inject trainable rank decomposition matrices (A and B)
  • Only train these new matrices (< 1% of total parameters)

Benefits:
  • Memory efficient: ~10x smaller than full fine-tuning
  • Faster training: Training is faster
  • No inference latency: Weights can be merged
  • Task switching: Swap LoRA adapters for different tasks

PEFT (Parameter-Efficient Fine-Tuning)

PEFT is a library from Hugging Face that provides various efficient fine-tuning methods:

  • LoRA: Low-rank adaptation
  • QLoRA: LoRA with quantization
  • Prefix Tuning: Prepend trainable tokens
  • Prompt Tuning: Learn soft prompts
  • IA3: Infused Adapter by Inhibiting and Amplifying

Installation

# Install packages

pip install transformers datasets accelerate peft bitsandbytes

pip install trl # For training utilities

pip install wandb # Optional: experiment tracking

For QLoRA (4-bit quantization)

pip install bitsandbytes>=0.41.0

Verify GPU

python -c "import torch; print(torch.cuda.isavailable())"

Quick Start: Fine-tune with LoRA

1. Basic LoRA Fine-tuning

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments

from peft import LoraConfig, getpeftmodel, TaskType

from datasets import loaddataset

from trl import SFTTrainer

Load base model and tokenizer

modelname = "meta-llama/Llama-2-7b-hf" # Or another model

tokenizer = AutoTokenizer.frompretrained(modelname)

tokenizer.padtoken = tokenizer.eostoken

model = AutoModelForCausalLM.frompretrained(

modelname,

torchdtype="auto",

devicemap="auto"

)

LoRA configuration

loraconfig = LoraConfig(

r=8, # Rank of the update matrices

loraalpha=32, # Scaling factor

targetmodules=["qproj", "vproj"], # Modules to apply LoRA

loradropout=0.05,

bias="none",

tasktype=TaskType.CAUSALLM

)

Apply LoRA to model

model = getpeftmodel(model, loraconfig)

model.printtrainableparameters()

Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06%

Load dataset

dataset = loaddataset("timdettmers/openassistant-guanaco", split="train")

Training arguments

trainingargs = TrainingArguments(

outputdir="./lora-output",

numtrainepochs=3,

perdevicetrainbatchsize=4,

gradientaccumulationsteps=4,

learningrate=2e-4,

fp16=True,

loggingsteps=10,

savestrategy="epoch",

warmupratio=0.03,

)

Trainer

trainer = SFTTrainer(

model=model,

args=trainingargs,

traindataset=dataset,

tokenizer=tokenizer,

datasettextfield="text",

maxseqlength=512,

)

Train

trainer.train()

Save LoRA weights

model.savepretrained("./lora-adapter")

2. LoRA Config Parameters

from peft import LoraConfig

config = LoraConfig(

# Core parameters

r=8, # Rank: dimension of low-rank matrices

# Higher = more expressive but more params

# Typical: 4, 8, 16, 32, 64

loraalpha=32, # Scaling factor

# Rule: alpha = 2 r for optimal results

# Scaling = alpha / r

# Target modules (varies by model architecture)

targetmodules=[

"qproj", # Query projection

Related Articles