Complete AWS SageMaker Tutorial: End-to-End ML Pipeline

Amazon SageMaker is a fully managed machine learning service that enables data scientists and developers to build, train, and deploy ML models at scale. This tutorial covers the complete ML lifecycle on AWS.

Why AWS SageMaker?

SageMaker Advantages:

Fully managed: No infrastructure to manage
End-to-end: Complete ML lifecycle support
Scalable: Train on any scale with managed infrastructure
Integrated: Native AWS service integration
Cost-effective: Pay only for what you use

Key Components:

SageMaker Studio (IDE)
SageMaker Training
SageMaker Inference
SageMaker Pipelines
SageMaker Feature Store
SageMaker Model Monitor

Prerequisites

# Install AWS CLI and SDK pip install boto3 sagemaker pandas scikit-learn Configure AWS credentials aws configure Enter: AWS Access Key ID, Secret Access Key, Region (e.g., us-east-1)

Quick Start

1. Setup SageMaker Session

import boto3
import sagemaker
from sagemaker import getexecutionrole

Create session
session = sagemaker.Session()
bucket = session.defaultbucket()

role = getexecutionrole()  # Or specify IAM role ARN


print(f"Bucket: {bucket}")
print(f"Role: {role}")
print(f"Region: {session.botoregionname}")

2. Prepare Training Data

import pandas as pd
from sklearn.datasets import loadiris
from sklearn.modelselection import traintestsplit


Load sample data
iris = loadiris()
df = pd.DataFrame(iris.data, columns=iris.featurenames)

df['target'] = iris.target

Split data
traindf, testdf = traintestsplit(df, testsize=0.2, randomstate=42)


Save to S3
trainpath = f"s3://{bucket}/iris/train/train.csv"
testpath = f"s3://{bucket}/iris/test/test.csv"


traindf.tocsv(trainpath, index=False)
testdf.tocsv(testpath, index=False)


print(f"Training data: {trainpath}")
print(f"Test data: {testpath}")

Built-in Algorithms

1. XGBoost Training

from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

Get XGBoost container
container = sagemaker.imageuris.retrieve(
    framework="xgboost",
    region=session.botoregionname,
    version="1.5-1"
)

Create estimator
xgbestimator = Estimator(

    imageuri=container,
    role=role,
    instancecount=1,

    instancetype="ml.m5.xlarge",
    outputpath=f"s3://{bucket}/iris/output",

    sagemakersession=session,
    hyperparameters={
        "objective": "multi:softmax",
        "numclass": 3,

        "numround": 100,
        "maxdepth": 5,

        "eta": 0.2
    }
)

Define training input
traininput = TrainingInput(
    s3data=trainpath,
    contenttype="csv"

)

Train model
xgbestimator.fit({"train": traininput})

2. Linear Learner

from sagemaker import LinearLearner

Create Linear Learner estimator
linear = LinearLearner(
    role=role,
    instancecount=1,
    instancetype="ml.m5.large",

    predictortype="multiclassclassifier",

    numclasses=3,
    outputpath=f"s3://{bucket}/linear/output"

)

Prepare data in RecordIO format
trainrecords = linear.recordset(

    traindf.drop('target', axis=1).values.astype('float32'),
    traindf['target'].values.astype('float32'),

    channel='train'
)

Train
linear.fit(trainrecords)

Custom Training Scripts

1. Scikit-learn Training

# trainsklearn.py
import argparse
import joblib
import os
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracyscore

Complete AWS SageMaker Tutorial: Machine Learning in the Cloud

Complete AWS SageMaker Tutorial: End-to-End ML Pipeline

Why AWS SageMaker?

Prerequisites

Configure AWS credentials

Enter: AWS Access Key ID, Secret Access Key, Region (e.g., us-east-1)

Quick Start

1. Setup SageMaker Session

Create session

2. Prepare Training Data

Load sample data

Split data

Save to S3

Built-in Algorithms

1. XGBoost Training

Get XGBoost container

Create estimator

Define training input

Train model

2. Linear Learner

Create Linear Learner estimator

Prepare data in RecordIO format

Train

Custom Training Scripts

1. Scikit-learn Training

Related Articles

Complete Vertex AI Tutorial: Google Cloud Unified ML Platform

Complete Azure Machine Learning Tutorial: End-to-End ML Platform

AWS SageMaker Model Monitor Tutorial: Production Model Monitoring

AWS SageMaker Feature Store Tutorial: Feature Management for ML

Related Articles

Complete Vertex AI Tutorial: Google Cloud Unified ML Platform

Tutorial Lengkap Vertex AI: Platform ML Terpadu di Google Cloud Vertex AI adalah platform machine learning terpadu Googl...

Complete Azure Machine Learning Tutorial: End-to-End ML Platform

Tutorial Lengkap Azure Machine Learning: ML End-to-End di Azure Azure Machine Learning adalah platform berbasis cloud un...

AWS SageMaker Model Monitor Tutorial: Production Model Monitoring

Tutorial Lengkap AWS SageMaker Model Monitor: Monitoring Model ML di Production Amazon SageMaker Model Monitor secara ot...

AWS SageMaker Feature Store Tutorial: Feature Management for ML

Tutorial Lengkap AWS SageMaker Feature Store: Manajemen Fitur untuk ML Amazon SageMaker Feature Store adalah repositori ...