Complete Terraform for ML Tutorial: Infrastructure as Code for Machine Learning
Terraform is an Infrastructure as Code (IaC) tool that enables you to provision and manage cloud infrastructure declaratively. For ML teams, Terraform helps automate the deployment of training clusters, inference endpoints, and data pipelines.
Why Terraform for ML?
Terraform Advantages:- Reproducibility: Same infrastructure every time
- Version control: Track infrastructure changes
- Multi-cloud: AWS, GCP, Azure support
- Modularity: Reusable infrastructure components
- State management: Track resource state
- ML training infrastructure
- Inference endpoint deployment
- Data pipeline provisioning
- Development environments
- Multi-region deployments
Installation
# macOS
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
Linux
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsbrelease -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
Verify installation
terraform --version
Quick Start
1. Project Structure
ml-infrastructure/
├── main.tf # Main configuration
├── variables.tf # Input variables
├── outputs.tf # Output values
├── providers.tf # Provider configuration
├── terraform.tfvars # Variable values
└── modules/
├── training/
├── inference/
└── storage/
2. Basic Configuration
# providers.tf
terraform {
requiredproviders {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
requiredversion = ">= 1.0"
}
provider "aws" {
region = var.awsregion
}
# variables.tf
variable "awsregion" {
description = "AWS region"
type = string
default = "us-west-2"
}
variable "environment" {
description = "Environment name"
type = string
default = "dev"
}
variable "projectname" {
description = "Project name"
type = string
default = "ml-platform"
}
3. Basic Commands
# Initialize Terraform
terraform init
Format configuration
terraform fmt
Validate configuration
terraform validate
Plan changes
terraform plan
Apply changes
terraform apply
Destroy infrastructure
terraform destroy
S3 for ML Data
1. Data Lake Setup
# modules/storage/main.tf
resource "awss3bucket" "mldata" {
bucket = "${var.projectname}-${var.environment}-ml-data"
tags = {
Environment = var.environment
Project = var.projectname
Purpose = "ML Data Lake"
}
}
resource "awss3bucketversioning" "mldata" {
bucket = awss3bucket.mldata.id
versioningconfiguration {
status = "Enabled"
}
}
resource "awss3bucketlifecycleconfiguration" "mldata" {
bucket = awss3bucket.mldata.id
rule {
id = "archive-old-data"
status = "Enabled"
transition {
days = 90
storageclass = "STANDARDIA"
}
transition {
days = 180
storageclass = "GLACIER"
}
}
}
Folder structure
resource "awss3object" "rawdata" {
bucket = awss3bucket.mldata.id
key = "raw/"
contenttype = "application/x-directory"
}
resource "awss3object" "processeddata" {
bucket = awss3bucket.mldata.id
key = "processed/"
contenttype = "application/x-directory"
}
resource "awss3object" "models" {