Tutorial 18: TensorFlow Lite - Deploy ML on Mobile
Table of Contents
Introduction
Deploying machine learning models on mobile and edge devices opens up possibilities that cloud-only inference cannot match: offline capability, reduced latency, improved privacy, and lower operational costs. TensorFlow Lite (TFLite) is Google's framework for running ML models on mobile phones, embedded devices, and edge hardware.
This tutorial covers the complete workflow from converting trained models to the TFLite format, applying quantization to reduce model size and improve speed, deploying on Android and iOS, running on Edge TPUs, and benchmarking performance. Whether you are building a real-time image classifier, an on-device NLP model, or a sensor data pipeline, these techniques apply directly.
Prerequisites
- Python 3.9+ with TensorFlow 2.15+
- Android Studio (for Android deployment)
- Xcode 15+ (for iOS deployment)
- Basic understanding of neural network architectures
- A trained model (we will create one for demonstration)
# Install required packages
pip install tensorflow tflite-support onnx onnx-tf torch
import tensorflow as tf
import numpy as np
print(f"TensorFlow version: {tf.version}")
Understanding TensorFlow Lite
TFLite uses a different model format (.tflite) from standard TensorFlow (.pb, SavedModel). The TFLite format is a FlatBuffer-based schema optimized for:
- Small binary size - FlatBuffers are compact and zero-copy
- Fast initialization - no parsing overhead, memory-mapped
- Reduced memory footprint - designed for constrained devices
- Hardware acceleration - supports GPU delegates, NNAPI, Edge TPU
The conversion pipeline looks like this:
TensorFlow Model / PyTorch Model
|
v
TFLite Converter (with optional optimizations)
|
v
.tflite file
|
v
TFLite Interpreter (on device)
Model Conversion from TensorFlow
Converting a Keras Model
import tensorflow as tf
from tensorflow import keras
Create and train a sample image classification model
def buildmodel():
model = keras.Sequential([
keras.layers.Conv2D(32, (3, 3), activation='relu',
inputshape=(224, 224, 3)),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Conv2D(64, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Conv2D(128, (3, 3), activation='relu'),
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dense(256, activation='relu'),
keras.layers.Dropout(0.5),
keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparsecategoricalcrossentropy',
metrics=['accuracy']
)
return model
model = buildmodel()
Method 1: Convert from Keras model directly
converter = tf.lite.TFLiteConverter.fromkerasmodel(model)
tflitemodel = converter.convert()
Save the model
with open('model.tflite', 'wb') as f:
f.write(tflitemodel)
print(f"Model size: {len(tflitemodel) / 1024:.1f} KB")