Complete Ollama Tutorial: Deploy LLMs Locally

Ollama is an open-source tool that makes it easy to run Large Language Models (LLMs) locally on your computer. With Ollama, you can use models like Llama 3, Mistral, Gemma, and many more without requiring internet connection or paid APIs.

Why Ollama?

Benefits of using Ollama:

Privacy: Data never leaves your computer
No API costs: Free after downloading models
Offline capable: Works without internet
Easy setup: One command to run models
OpenAI-compatible API: Drop-in replacement for OpenAI

Use Cases:

Development and testing AI applications
Private/sensitive data processing
Offline AI applications
Learning and experimenting with LLMs
Cost-effective inference

Installation

1. Install on Linux

# Install with script curl -fsSL https://ollama.com/install.sh | sh Verify installation ollama --version

2. Install on macOS

# Download from website or use Homebrew brew install ollama Or download .dmg from https://ollama.com/download

3. Install on Windows

Download installer from ollama.com/download and run it.

4. Install via Docker

# Pull image docker pull ollama/ollama Run container docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama With GPU (NVIDIA) docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Quick Start

1. Pull and Run Model

# Download and run Llama 3 ollama run llama3 Or other models ollama run mistral ollama run gemma:7b ollama run phi3 ollama run codellama Chat directly in terminal >> Hello, who are you? I am an AI assistant... >> /bye # Exit chat

2. Available Models

| Model | Size | Use Case |

|-------|------|----------|

| llama3:8b | 4.7GB | General purpose, balanced |

| llama3:70b | 40GB | High quality responses |

| mistral | 4.1GB | Fast, efficient |

| gemma:7b | 5GB | Google's open model |

| phi3 | 2.2GB | Small, efficient |

| codellama | 3.8GB | Code generation |

| llava | 4.5GB | Vision + Language |

| mixtral | 26GB | Mixture of experts |

# List available models ollama list Pull specific version ollama pull llama3:8b ollama pull llama3:70b Remove model ollama rm llama3:8b

3. Model Commands

# Show model info
ollama show llama3

Copy model (for custom)
ollama cp llama3 my-llama3

Push to registry (if you have account)
ollama push username/my-model

REST API

Ollama provides an OpenAI-compatible REST API.

1. Generate Completion

# Simple generation curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt": "Explain machine learning in 2 sentences" }' With streaming disabled curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt": "Hello", "stream": false }'

2. Chat API

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is Python?"}
  ],
  "stream": false
}'

3. Embeddings

curl http://localhost:11434/api/embeddings -d '{ "model": "llama3", "prompt": "Text to embed" }'

Python Integration

1. Using requests

import requests
import json

def generate(prompt, model="llama3"):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }

Complete Ollama Tutorial: Deploy LLMs Locally

Complete Ollama Tutorial: Deploy LLMs Locally

Why Ollama?

Installation

1. Install on Linux

Verify installation

2. Install on macOS

Or download .dmg from https://ollama.com/download

3. Install on Windows

4. Install via Docker

Run container

With GPU (NVIDIA)

Quick Start

1. Pull and Run Model

Or other models

Chat directly in terminal

2. Available Models

Pull specific version

Remove model

3. Model Commands

Copy model (for custom)

Push to registry (if you have account)

REST API

1. Generate Completion

With streaming disabled

2. Chat API

3. Embeddings

Python Integration

1. Using requests

Related Articles

DSPy: A Framework for Programmatic LLM Optimization

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Complete vLLM Tutorial: High-Performance LLM Serving

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Related Articles

DSPy: A Framework for Programmatic LLM Optimization

DSPy: Framework untuk Optimasi LLM Secara Programatik Prompt engineering secara manual adalah proses yang melelahkan dan...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

Complete vLLM Tutorial: High-Performance LLM Serving

Tutorial Lengkap vLLM: High-Performance LLM Serving vLLM adalah library Python untuk inference dan serving LLM dengan pe...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...