LiteLLM: Universal API Gateway for 100+ LLM Models
In the rapidly evolving AI landscape, we face dozens of LLM (Large Language Model) providers such as OpenAI, Anthropic, Google Gemini, Cohere, Ollama, and many more. Each provider has different API formats, authentication methods, and parameters. Imagine if you could call all these models with a single, unified interface. That is exactly what LiteLLM offers.
LiteLLM is an open-source Python library that provides a unified interface to call 100+ LLM models using the OpenAI API format. With LiteLLM, you write your code once and can switch between providers without changing a single line of code.
Why LiteLLM?
Here are several reasons why LiteLLM has become a popular choice:
- Unified API: One calling format for all providers
- 100+ Models: Supports OpenAI, Anthropic, Google, Cohere, Ollama, HuggingFace, and more
- OpenAI-Compatible Proxy: Run a proxy server compatible with the OpenAI API
- Fallback & Retry: Automatically switch to another model if one fails
- Cost Tracking: Track usage costs for each model
- Load Balancing: Distribute requests across multiple models/providers
- Streaming: Built-in streaming response support
- Production Ready: Router for large-scale production deployments
Installation and Setup
Installing LiteLLM
Install LiteLLM using pip:
pip install litellm
For the proxy server feature, install with additional dependencies:
pip install 'litellm[proxy]'
Configuring API Keys
Before getting started, prepare your API keys from the providers you want to use. Store them as environment variables:
# OpenAI
export OPENAIAPIKEY="sk-your-openai-key"
Anthropic
export ANTHROPICAPIKEY="sk-ant-your-anthropic-key"
Google Gemini
export GEMINIAPIKEY="your-gemini-key"
Or use a .env file
You can also use a .env file and load it with python-dotenv:
from dotenv import loaddotenv
load
dotenv()
Basic Completion Calls
Calling OpenAI GPT
import litellm
Call OpenAI GPT-4o
response = litellm.completion(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what machine learning is in 2 sentences."}
]
)
print(response.choices[0].message.content)
Calling Anthropic Claude
# Call Anthropic Claude - just change the model name!
response = litellm.completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "Explain what a neural network is in 2 sentences."}
]
)
print(response.choices[0].message.content)
Calling Google Gemini
# Call Google Gemini
response = litellm.completion(
model="gemini/gemini-2.0-flash",
messages=[
{"role": "user", "content": "What is the difference between supervised and unsupervised learning?"}
]
)
print(response.choices[0].message.content)
Calling Ollama (Local Models)
# Call a local model via Ollama
response = litellm.completion(
model="ollama/llama3",
messages=[
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
],
apibase="http://localhost:11434"
)
print(response.choices[0].message.content)
Unified Interface - Same Code, Different Providers
The key advantage of LiteLLM is that you can create a generic function that works with any provider:
import litellm
def askai(question: str, model: str = "gpt-4o") -> str:
"""Universal function to query any AI model."""
response = litellm.completion(
model=model,
messages=[
{"role": "user", "content": question}
],
temperature=0.7,
maxtokens=1000
)