Complete Ollama Tutorial: Deploy LLMs Locally

# Tutorial Lengkap Ollama: Deploy LLM Secara Lokal Ollama adalah tool open-source yang memudahkan Anda menjalankan Large Language Models (LLM) secara lokal di komputer Anda. Dengan Ollama, Anda dapat...

By Ruby Abdullah · · tutorial
OllamaLLMAILocal AIPythonMachine Learning

Complete Ollama Tutorial: Deploy LLMs Locally

Ollama is an open-source tool that makes it easy to run Large Language Models (LLMs) locally on your computer. With Ollama, you can use models like Llama 3, Mistral, Gemma, and many more without requiring internet connection or paid APIs.

Why Ollama?

Benefits of using Ollama:
  • Privacy: Data never leaves your computer
  • No API costs: Free after downloading models
  • Offline capable: Works without internet
  • Easy setup: One command to run models
  • OpenAI-compatible API: Drop-in replacement for OpenAI

Use Cases:
  • Development and testing AI applications
  • Private/sensitive data processing
  • Offline AI applications
  • Learning and experimenting with LLMs
  • Cost-effective inference

Installation

1. Install on Linux

# Install with script

curl -fsSL https://ollama.com/install.sh | sh

Verify installation

ollama --version

2. Install on macOS

# Download from website or use Homebrew

brew install ollama

Or download .dmg from https://ollama.com/download

3. Install on Windows

Download installer from ollama.com/download and run it.

4. Install via Docker

# Pull image

docker pull ollama/ollama

Run container

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

With GPU (NVIDIA)

docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Quick Start

1. Pull and Run Model

# Download and run Llama 3

ollama run llama3

Or other models

ollama run mistral

ollama run gemma:7b

ollama run phi3

ollama run codellama

Chat directly in terminal

>> Hello, who are you?

I am an AI assistant...

>> /bye # Exit chat

2. Available Models

| Model | Size | Use Case |

|-------|------|----------|

| llama3:8b | 4.7GB | General purpose, balanced |

| llama3:70b | 40GB | High quality responses |

| mistral | 4.1GB | Fast, efficient |

| gemma:7b | 5GB | Google's open model |

| phi3 | 2.2GB | Small, efficient |

| codellama | 3.8GB | Code generation |

| llava | 4.5GB | Vision + Language |

| mixtral | 26GB | Mixture of experts |

# List available models

ollama list

Pull specific version

ollama pull llama3:8b

ollama pull llama3:70b

Remove model

ollama rm llama3:8b

3. Model Commands

# Show model info

ollama show llama3

Copy model (for custom)

ollama cp llama3 my-llama3

Push to registry (if you have account)

ollama push username/my-model

REST API

Ollama provides an OpenAI-compatible REST API.

1. Generate Completion

# Simple generation

curl http://localhost:11434/api/generate -d '{

"model": "llama3",

"prompt": "Explain machine learning in 2 sentences"

}'

With streaming disabled

curl http://localhost:11434/api/generate -d '{

"model": "llama3",

"prompt": "Hello",

"stream": false

}'

2. Chat API

curl http://localhost:11434/api/chat -d '{

"model": "llama3",

"messages": [

{"role": "system", "content": "You are a helpful assistant"},

{"role": "user", "content": "What is Python?"}

],

"stream": false

}'

3. Embeddings

curl http://localhost:11434/api/embeddings -d '{

"model": "llama3",

"prompt": "Text to embed"

}'

Python Integration

1. Using requests

import requests

import json

def generate(prompt, model="llama3"):

response = requests.post(

"http://localhost:11434/api/generate",

json={

"model": model,

"prompt": prompt,

"stream": False

}

Related Articles

DSPy: A Framework for Programmatic LLM Optimization

DSPy: Framework untuk Optimasi LLM Secara Programatik Prompt engineering secara manual adalah proses yang melelahkan dan...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

Complete vLLM Tutorial: High-Performance LLM Serving

Tutorial Lengkap vLLM: High-Performance LLM Serving vLLM adalah library Python untuk inference dan serving LLM dengan pe...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...