Complete ONNX Runtime Tutorial: High-Performance ML Inference
ONNX Runtime is a high-performance inference engine for machine learning models in ONNX format. It provides cross-platform acceleration for models trained in PyTorch, TensorFlow, and other frameworks, making it ideal for production ML deployments.
Why ONNX Runtime?
ONNX Runtime Advantages:- Cross-platform: Windows, Linux, macOS, mobile, edge
- Hardware acceleration: CPU, GPU, NPU optimization
- Framework agnostic: Works with any ONNX model
- High performance: Optimized execution providers
- Production ready: Stable API and enterprise support
- Model inference optimization
- Cross-framework deployment
- Edge and mobile inference
- Cloud model serving
- Real-time predictions
Installation
# CPU version
pip install onnxruntime
GPU version (CUDA)
pip install onnxruntime-gpu
For model conversion
pip install onnx torch transformers
Verify installation
python -c "import onnxruntime as ort; print(ort.version)"
Quick Start
1. Basic Inference
import onnxruntime as ort
import numpy as np
Load model
session = ort.InferenceSession("model.onnx")
Get input/output info
inputname = session.getinputs()[0].name
outputname = session.getoutputs()[0].name
print(f"Input: {inputname}")
print(f"Output: {outputname}")
Prepare input
inputdata = np.random.randn(1, 3, 224, 224).astype(np.float32)
Run inference
result = session.run([outputname], {inputname: inputdata})
print(f"Output shape: {result[0].shape}")
2. Multiple Inputs/Outputs
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("multiiomodel.onnx")
Get all inputs
inputs = {inp.name: inp for inp in session.getinputs()}
for name, inp in inputs.items():
print(f"Input: {name}, Shape: {inp.shape}, Type: {inp.type}")
Get all outputs
outputs = [out.name for out in session.getoutputs()]
Prepare inputs
inputfeed = {
"inputids": np.array([[1, 2, 3, 4, 5]]).astype(np.int64),
"attentionmask": np.array([[1, 1, 1, 1, 1]]).astype(np.int64)
}
Run inference
results = session.run(outputs, inputfeed)
for name, result in zip(outputs, results):
print(f"{name}: {result.shape}")
Model Conversion
1. PyTorch to ONNX
import torch
import torch.nn as nn
Define model
class SimpleModel(nn.Module):
def init(self):
super().init()
self.fc1 = nn.Linear(10, 50)
self.fc2 = nn.Linear(50, 2)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x)
model = SimpleModel()
model.eval()
Create dummy input
dummyinput = torch.randn(1, 10)
Export to ONNX
torch.onnx.export(
model,
dummyinput,
"model.onnx",
inputnames=["input"],
outputnames=["output"],
dynamicaxes={
"input": {0: "batchsize"},
"output": {0: "batchsize"}
},
opsetversion=14
)
2. TensorFlow to ONNX
import tensorflow as tf
import tf2onnx
Load TensorFlow model
model = tf.keras.models.loadmodel("tfmodel")
Convert to ONNX
spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
modelproto, = tf2onnx.convert.fromkeras(
model,
inputsignature=spec,
outputpath="model.onnx",
opset=14
)
3. Hugging Face Transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification
Load and export
modelid = "distilbert-base-uncased-finetuned-sst-2-english"