Complete Guide to Finetuning EasyOCR for Custom Datasets
EasyOCR is a powerful open-source OCR (Optical Character Recognition) library that supports over 80 languages. However, for specific use cases such as custom fonts, historical documents, handwriting, or unique document formats, finetuning the EasyOCR model can significantly improve accuracy.
In this tutorial, we will learn how to perform EasyOCR finetuning from scratch to model evaluation.
Prerequisites
Before starting, ensure you have:
- Python 3.7+ (Python 3.8 or 3.9 recommended)
- GPU with CUDA support (highly recommended, minimum 8GB VRAM)
- At least 16GB system RAM
- Image dataset with ground truth labels (minimum 1000 samples for good results)
- Minimum 10GB disk space for models and dataset
Installation and Environment Setup
1. Clone EasyOCR Repository
# Clone repository
git clone https://github.com/JaidedAI/EasyOCR.git
cd EasyOCR
Checkout to stable branch (optional)
git checkout v1.7.0
2. Setup Virtual Environment
# Create virtual environment
python -m venv easyocrenv
Activate
For Linux/Mac:
source easyocrenv/bin/activate
For Windows:
easyocrenv\Scripts\activate
3. Install Dependencies
# Install PyTorch with CUDA support
Adjust CUDA version to match your system
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
Install EasyOCR requirements
pip install -r requirements.txt
Install additional training dependencies
pip install tensorboard
pip install lmdb
pip install pillow
pip install opencv-python
pip install albumentations
pip install python-Levenshtein
4. Verify Installation
import torch
print(f"PyTorch version: {torch.version}")
print(f"CUDA available: {torch.cuda.isavailable()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.devicecount()}")
if torch.cuda.isavailable():
print(f"GPU name: {torch.cuda.getdevicename(0)}")
Dataset Preparation
The dataset is the most important component in finetuning. Dataset quality and quantity significantly affect the final results.
1. Dataset Folder Structure
Create folder structure like this:
dataset/
├── raw/
│ ├── train/
│ │ ├── img001.jpg
│ │ ├── img002.jpg
│ │ └── ...
│ ├── validation/
│ │ ├── img001.jpg
│ │ └── ...
│ └── test/
│ ├── img001.jpg
│ └── ...
├── labels/
│ ├── trainlabels.txt
│ ├── vallabels.txt
│ └── testlabels.txt
└── lmdb/
├── train/
└── validation/
2. Label File Format
Label files use TSV (Tab-Separated Values) format:
trainlabels.txt:img001.jpg Hello World
img
002.jpg Invoice #12345
img003.jpg Total: $1,250.00
img004.jpg PT Rubythalib Data Konsulta
Each line contains:
- Image filename
- Tab character (\t)
- Ground truth text
- Images must be clear and readable
- Minimum resolution 64x256 pixels
- Varied fonts, sizes, and styles
- Various lighting conditions
- Include realistic noise and distortion
- Balanced character distribution
3. Dataset Preprocessing Script
Create file preparedataset.py:
import os
import cv2
import numpy as np
from PIL import Image
from pathlib import Path
def preprocessimage(imagepath, outputpath):
"""
Image preprocessing for OCR training
"""
# Read image
img = cv2.imread(str(imagepath))
if img is None:
print(f"Error reading {imagepath}")
return False
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLORBGR2GRAY)
# Noise reduction
denoised = cv2.fastNlMeansDenoising(gray, None, 10, 7, 21)