SQLModel: Modern Python ORM for Type-Safe AI Applications
In AI/ML application development, database management is a crucial component. From storing experiment results, managing model registries, to dataset versioning tracking, everything requires efficient and type-safe database interactions. SQLModel, created by Sebastian Ramirez (the creator of FastAPI), combines the power of SQLAlchemy and Pydantic to deliver the best ORM experience in Python.
In this tutorial, we will explore SQLModel in depth and build a production-ready ML experiment tracking and model registry system.
What Is SQLModel?
SQLModel is a Python library that combines SQLAlchemy (the most popular ORM in Python) with Pydantic (a data validation library). The result is an ORM that is:
- Type-safe: Full type hints and IDE autocompletion
- Auto-validated: Data is validated before entering the database
- FastAPI compatible: Seamless integration from the same creator
- Simple: One class for both database model AND API schema
- Powerful: Full access to SQLAlchemy features when needed
Installation and Setup
Installing SQLModel
pip install sqlmodel
For specific databases:
# PostgreSQL
pip install sqlmodel psycopg2-binary
MySQL
pip install sqlmodel pymysql
Async support
pip install sqlmodel aiosqlite asyncpg
Verify
python -c "import sqlmodel; print(sqlmodel.version)"
Database Engine Setup
from sqlmodel import createengine, SQLModel
SQLite (development)
sqlite
url = "sqlite:///./mltracking.db"
engine = create
engine(sqliteurl, echo=True)
PostgreSQL (production)
postgres
url = "postgresql://user:password@localhost:5432/mltracking"
engine = create
engine(postgresurl, echo=False, poolsize=20, maxoverflow=10)
Create all tables
SQLModel.metadata.create
all(engine)
Model Definition
Basic Models
from sqlmodel import Field, SQLModel, Session, select
from typing import Optional
from datetime import datetime
import uuid
class Experiment(SQLModel, table=True):
"""Model for ML Experiment"""
id: Optional[int] = Field(default=None, primarykey=True)
name: str = Field(index=True, minlength=1, maxlength=255)
description: Optional[str] = Field(default=None, maxlength=1000)
datasetname: str = Field(index=True)
algorithm: str
hyperparameters: Optional[str] = Field(default=None) # JSON string
accuracy: Optional[float] = Field(default=None, ge=0.0, le=1.0)
loss: Optional[float] = Field(default=None, ge=0.0)
status: str = Field(default="created", index=True)
createdat: datetime = Field(defaultfactory=datetime.utcnow)
updatedat: Optional[datetime] = Field(default=None)
createdby: str = Field(default="system")
class MLModel(SQLModel, table=True):
"""Model for Model Registry"""
tablename = "mlmodels"
id: Optional[int] = Field(default=None, primarykey=True)
name: str = Field(index=True, unique=True)
version: str = Field(default="1.0.0")
framework: str # pytorch, tensorflow, sklearn, etc.
modelpath: str
filesizemb: Optional[float] = Field(default=None)
inputschema: Optional[str] = Field(default=None)
outputschema: Optional[str] = Field(default=None)
isactive: bool = Field(default=True, index=True)
createdat: datetime = Field(defaultfactory=datetime.utcnow)
class Dataset(SQLModel, table=True):
"""Model for Dataset Registry"""
id: Optional[int] = Field(default=None, primarykey=True)
name: str = Field(index=True)
version: str = Field(default="1.0")
filepath: str
rowcount: Optional[int] = Field(default=None)
columncount: Optional[int] = Field(default=None)
fileformat: str = Field(default="csv")
description: Optional[str] = Field(default=None)
createdat: datetime = Field(defaultfactory=datetime.utcnow)