Complete AWS Step Functions for ML Tutorial: Orchestrating ML Workflows
AWS Step Functions provides serverless workflow orchestration for machine learning pipelines. It enables you to coordinate multiple AWS services, handle errors gracefully, and build complex ML workflows with visual monitoring.
Why Step Functions for ML?
Key Benefits:- Visual workflows: See pipeline execution in real-time
- Error handling: Built-in retry and error recovery
- Service integration: Native AWS service connectors
- Serverless: No infrastructure to manage
- State management: Track workflow state automatically
- ML training pipelines
- Data preprocessing workflows
- Model deployment automation
- Batch inference orchestration
- MLOps automation
Prerequisites
pip install boto3 sagemaker
AWS CLI configured
aws configure
Quick Start
1. Basic ML Workflow
{
"Comment": "Simple ML Training Pipeline",
"StartAt": "PreprocessData",
"States": {
"PreprocessData": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:preprocess",
"Next": "TrainModel"
},
"TrainModel": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",
"Parameters": {
"TrainingJobName.$": "States.Format('training-{}', $.Execution.Name)",
"AlgorithmSpecification": {
"TrainingImage": "123456789.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest",
"TrainingInputMode": "File"
},
"RoleArn": "arn:aws:iam::123456789:role/SageMakerRole",
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri.$": "$.traindatauri"
}
}
}
],
"OutputDataConfig": {
"S3OutputPath": "s3://bucket/output"
},
"ResourceConfig": {
"InstanceCount": 1,
"InstanceType": "ml.m5.xlarge",
"VolumeSizeInGB": 30
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 3600
}
},
"Next": "EvaluateModel"
},
"EvaluateModel": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:evaluate",
"End": true
}
}
}
2. Deploy with CloudFormation
AWSTemplateFormatVersion: '2010-09-09'
Resources:
MLPipelineStateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
StateMachineName: ml-training-pipeline
RoleArn: !GetAtt StepFunctionsRole.Arn
DefinitionString: !Sub |
{
"StartAt": "PreprocessData",
"States": {
"PreprocessData": {
"Type": "Task",
"Resource": "${PreprocessFunction.Arn}",
"Next": "TrainModel"
},
"TrainModel": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",
"Parameters": {
"TrainingJobName.$": "States.Format('job-{}', $.Execution.Name)"
},
"End": true
}
}
}
SageMaker Integration
1. Training Job
{
"TrainModel": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",
"Parameters": {
"TrainingJobName.$": "States.Format('training-{}', $.Execution.Name)",
"AlgorithmSpecification": {
"TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.5-1",
"TrainingInputMode": "File"
},
"RoleArn": "arn:aws:iam::123456789:role/SageMakerRole",
"InputDataConfig": [
{
"ChannelName": "train",