Complete ONNX Runtime Tutorial: High-Performance ML Inference

# Tutorial Lengkap ONNX Runtime: High-Performance ML Inference ONNX Runtime adalah inference engine performa tinggi untuk model machine learning dalam format ONNX. Library ini menyediakan akselerasi...

By Ruby Abdullah · · tutorial
ONNX RuntimeModel InferenceOptimizationMLOpsPythonProduction ML

Complete ONNX Runtime Tutorial: High-Performance ML Inference

ONNX Runtime is a high-performance inference engine for machine learning models in ONNX format. It provides cross-platform acceleration for models trained in PyTorch, TensorFlow, and other frameworks, making it ideal for production ML deployments.

Why ONNX Runtime?

ONNX Runtime Advantages:
  • Cross-platform: Windows, Linux, macOS, mobile, edge
  • Hardware acceleration: CPU, GPU, NPU optimization
  • Framework agnostic: Works with any ONNX model
  • High performance: Optimized execution providers
  • Production ready: Stable API and enterprise support

Use Cases:
  • Model inference optimization
  • Cross-framework deployment
  • Edge and mobile inference
  • Cloud model serving
  • Real-time predictions

Installation

# CPU version

pip install onnxruntime

GPU version (CUDA)

pip install onnxruntime-gpu

For model conversion

pip install onnx torch transformers

Verify installation

python -c "import onnxruntime as ort; print(ort.version)"

Quick Start

1. Basic Inference

import onnxruntime as ort

import numpy as np

Load model

session = ort.InferenceSession("model.onnx")

Get input/output info

inputname = session.getinputs()[0].name

outputname = session.getoutputs()[0].name

print(f"Input: {inputname}")

print(f"Output: {outputname}")

Prepare input

inputdata = np.random.randn(1, 3, 224, 224).astype(np.float32)

Run inference

result = session.run([outputname], {inputname: inputdata})

print(f"Output shape: {result[0].shape}")

2. Multiple Inputs/Outputs

import onnxruntime as ort

import numpy as np

session = ort.InferenceSession("multiiomodel.onnx")

Get all inputs

inputs = {inp.name: inp for inp in session.getinputs()}

for name, inp in inputs.items():

print(f"Input: {name}, Shape: {inp.shape}, Type: {inp.type}")

Get all outputs

outputs = [out.name for out in session.getoutputs()]

Prepare inputs

inputfeed = {

"inputids": np.array([[1, 2, 3, 4, 5]]).astype(np.int64),

"attentionmask": np.array([[1, 1, 1, 1, 1]]).astype(np.int64)

}

Run inference

results = session.run(outputs, inputfeed)

for name, result in zip(outputs, results):

print(f"{name}: {result.shape}")

Model Conversion

1. PyTorch to ONNX

import torch

import torch.nn as nn

Define model

class SimpleModel(nn.Module):

def init(self):

super().init()

self.fc1 = nn.Linear(10, 50)

self.fc2 = nn.Linear(50, 2)

def forward(self, x):

x = torch.relu(self.fc1(x))

return self.fc2(x)

model = SimpleModel()

model.eval()

Create dummy input

dummyinput = torch.randn(1, 10)

Export to ONNX

torch.onnx.export(

model,

dummyinput,

"model.onnx",

inputnames=["input"],

outputnames=["output"],

dynamicaxes={

"input": {0: "batchsize"},

"output": {0: "batchsize"}

},

opsetversion=14

)

2. TensorFlow to ONNX

import tensorflow as tf

import tf2onnx

Load TensorFlow model

model = tf.keras.models.loadmodel("tfmodel")

Convert to ONNX

spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)

modelproto, = tf2onnx.convert.fromkeras(

model,

inputsignature=spec,

outputpath="model.onnx",

opset=14

)

3. Hugging Face Transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer

from optimum.onnxruntime import ORTModelForSequenceClassification

Load and export

modelid = "distilbert-base-uncased-finetuned-sst-2-english"

Related Articles

Kedro Tutorial: Reproducible and Maintainable Data Science Pipelines

Kedro: Pipeline Data Science yang Reproducible dan Mudah Dirawat Sebagian besar proyek data science dimulai dari satu no...

Ray Train & Ray Tune Tutorial: Distributed Training and Hyperparameter Tuning

Ray Train & Ray Tune: Pelatihan Terdistribusi dan Penyetelan Hiperparameter Sebagian besar proyek machine learning dimul...

ZenML: Modular and Cloud-Agnostic MLOps Pipeline Framework

ZenML: Framework Pipeline MLOps yang Modular dan Cloud-Agnostic Pendahuluan Membangun model machine learning yang akurat...

Optuna Tutorial: Automatic Hyperparameter Optimization

Tutorial Lengkap Optuna: Optimasi Hyperparameter Otomatis Daftar Isi Pendahuluan Prasyarat Instalasi dan Setup [Konsep D...