Complete Ollama Tutorial: Deploy LLMs Locally
Ollama is an open-source tool that makes it easy to run Large Language Models (LLMs) locally on your computer. With Ollama, you can use models like Llama 3, Mistral, Gemma, and many more without requiring internet connection or paid APIs.
Why Ollama?
Benefits of using Ollama:- Privacy: Data never leaves your computer
- No API costs: Free after downloading models
- Offline capable: Works without internet
- Easy setup: One command to run models
- OpenAI-compatible API: Drop-in replacement for OpenAI
- Development and testing AI applications
- Private/sensitive data processing
- Offline AI applications
- Learning and experimenting with LLMs
- Cost-effective inference
Installation
1. Install on Linux
# Install with script
curl -fsSL https://ollama.com/install.sh | sh
Verify installation
ollama --version
2. Install on macOS
# Download from website or use Homebrew
brew install ollama
Or download .dmg from https://ollama.com/download
3. Install on Windows
Download installer from ollama.com/download and run it.
4. Install via Docker
# Pull image
docker pull ollama/ollama
Run container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
With GPU (NVIDIA)
docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Quick Start
1. Pull and Run Model
# Download and run Llama 3
ollama run llama3
Or other models
ollama run mistral
ollama run gemma:7b
ollama run phi3
ollama run codellama
Chat directly in terminal
>> Hello, who are you?
I am an AI assistant...
>> /bye # Exit chat
2. Available Models
| Model | Size | Use Case |
|-------|------|----------|
| llama3:8b | 4.7GB | General purpose, balanced |
| llama3:70b | 40GB | High quality responses |
| mistral | 4.1GB | Fast, efficient |
| gemma:7b | 5GB | Google's open model |
| phi3 | 2.2GB | Small, efficient |
| codellama | 3.8GB | Code generation |
| llava | 4.5GB | Vision + Language |
| mixtral | 26GB | Mixture of experts |
# List available models
ollama list
Pull specific version
ollama pull llama3:8b
ollama pull llama3:70b
Remove model
ollama rm llama3:8b
3. Model Commands
# Show model info
ollama show llama3
Copy model (for custom)
ollama cp llama3 my-llama3
Push to registry (if you have account)
ollama push username/my-model
REST API
Ollama provides an OpenAI-compatible REST API.
1. Generate Completion
# Simple generation
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Explain machine learning in 2 sentences"
}'
With streaming disabled
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Hello",
"stream": false
}'
2. Chat API
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is Python?"}
],
"stream": false
}'
3. Embeddings
curl http://localhost:11434/api/embeddings -d '{
"model": "llama3",
"prompt": "Text to embed"
}'
Python Integration
1. Using requests
import requests
import json
def generate(prompt, model="llama3"):
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": model,
"prompt": prompt,
"stream": False
}