Tutorial Lengkap ONNX Runtime: High-Performance ML Inference
ONNX Runtime adalah inference engine performa tinggi untuk model machine learning dalam format ONNX. Library ini menyediakan akselerasi lintas platform untuk model yang dilatih di PyTorch, TensorFlow, dan framework lainnya, menjadikannya ideal untuk deployment ML production.
Mengapa ONNX Runtime?
Keunggulan ONNX Runtime:- Cross-platform: Windows, Linux, macOS, mobile, edge
- Akselerasi hardware: Optimasi CPU, GPU, NPU
- Framework agnostic: Bekerja dengan model ONNX apapun
- Performa tinggi: Execution providers teroptimasi
- Production ready: API stabil dan dukungan enterprise
- Optimasi inference model
- Deployment lintas framework
- Inference edge dan mobile
- Cloud model serving
- Real-time predictions
Instalasi
# Versi CPU
pip install onnxruntime
Versi GPU (CUDA)
pip install onnxruntime-gpu
Untuk konversi model
pip install onnx torch transformers
Verify instalasi
python -c "import onnxruntime as ort; print(ort.version)"
Quick Start
1. Basic Inference
import onnxruntime as ort
import numpy as np
Load model
session = ort.InferenceSession("model.onnx")
Get info input/output
inputname = session.getinputs()[0].name
outputname = session.getoutputs()[0].name
print(f"Input: {inputname}")
print(f"Output: {outputname}")
Siapkan input
inputdata = np.random.randn(1, 3, 224, 224).astype(np.float32)
Jalankan inference
result = session.run([outputname], {inputname: inputdata})
print(f"Output shape: {result[0].shape}")
2. Multiple Inputs/Outputs
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("multiiomodel.onnx")
Get semua inputs
inputs = {inp.name: inp for inp in session.getinputs()}
for name, inp in inputs.items():
print(f"Input: {name}, Shape: {inp.shape}, Type: {inp.type}")
Get semua outputs
outputs = [out.name for out in session.getoutputs()]
Siapkan inputs
inputfeed = {
"inputids": np.array([[1, 2, 3, 4, 5]]).astype(np.int64),
"attentionmask": np.array([[1, 1, 1, 1, 1]]).astype(np.int64)
}
Jalankan inference
results = session.run(outputs, inputfeed)
for name, result in zip(outputs, results):
print(f"{name}: {result.shape}")
Konversi Model
1. PyTorch ke ONNX
import torch
import torch.nn as nn
Definisikan model
class SimpleModel(nn.Module):
def init(self):
super().init()
self.fc1 = nn.Linear(10, 50)
self.fc2 = nn.Linear(50, 2)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x)
model = SimpleModel()
model.eval()
Buat dummy input
dummyinput = torch.randn(1, 10)
Export ke ONNX
torch.onnx.export(
model,
dummyinput,
"model.onnx",
inputnames=["input"],
outputnames=["output"],
dynamicaxes={
"input": {0: "batchsize"},
"output": {0: "batchsize"}
},
opsetversion=14
)
2. TensorFlow ke ONNX
import tensorflow as tf
import tf2onnx
Load model TensorFlow
model = tf.keras.models.loadmodel("tfmodel")
Konversi ke ONNX
spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
modelproto, = tf2onnx.convert.fromkeras(
model,
inputsignature=spec,
outputpath="model.onnx",
opset=14
)
3. Hugging Face Transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification
Load dan export
modelid = "distilbert-base-uncased-finetuned-sst-2-english"