Most teams building with LLMs start with one provider. Then they add a second for cost reasons. A third for a specific capability. Before long, they are maintaining three different API clients, three different error-handling patterns, and three different billing dashboards, all doing the same job.
LiteLLM was built to eliminate that problem. It is an open-source Python SDK and self-hosted proxy that gives you a single, OpenAI-compatible interface for 100+ LLM providers. One API call format. One place to track costs. One layer to set budgets, guardrails, and fallback logic.
This guide covers what LiteLLM is, how the proxy works, how to set it up, and the real-world use cases where it saves teams the most time and money in 2026.
LiteLLM is an open-source library and AI gateway that lets you call 100+ large language model (LLM) APIs through a single, unified interface. Instead of writing separate integration code for OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Cohere, Mistral, and others, you write one standard call and LiteLLM routes it to whichever provider you specify.
It comes in two forms:
litellm.completion() the same way you would use openai.chat.completions.create(), but with any provider.As of mid-2026, LiteLLM supports over 100 providers including OpenAI (GPT-5.5, GPT-5.4), Anthropic (Claude Opus 4.7, Claude Opus 4.8), AWS Bedrock, Google Vertex AI, Gemini, Mistral, Groq, Together AI, Fireworks, NVIDIA NIM, Replicate, Ollama, Hugging Face, Cohere, Azure OpenAI, and more.
A single completion() call works across every supported provider. Switching from GPT-5.4 to Claude Opus 4.7 to Mistral Large is a one-line change, with no refactoring of API clients, authentication logic, or response parsers.
The proxy is the production-grade component of LiteLLM. Run it as a Docker container in your own infrastructure and it acts as an OpenAI-compatible gateway for your entire organisation. Features include:
/ui showing per-team and per-model spendLiteLLM distributes requests across multiple providers or model deployments. If one provider returns a rate limit error or goes down, LiteLLM automatically retries on a configured fallback provider. This removes a major source of production outages in LLM-dependent applications.
Every request through the proxy is logged with token count, model, provider, and cost. Teams can set provider-level budgets, and the proxy returns an error before a budget is exceeded rather than after. For teams spending more than $100/month on LLM APIs, cost tracking alone justifies running the proxy.
LiteLLM integrates with guardrail providers to enforce content filters, PII redaction, and prompt injection detection at the gateway level. This means guardrails apply to every model and every team without changes to application code. In 2026, LiteLLM added OpenTelemetry span emission on guardrail violations for full observability.
LiteLLM normalises streaming responses across all providers into a consistent format. Whether the underlying provider uses server-sent events, chunked transfer encoding, or a custom streaming protocol, your application receives a uniform stream.
LiteLLM passes through prompt caching instructions to providers that support it (Anthropic Claude, AWS Bedrock) and tracks cached token costs separately in the dashboard. For applications with long, repeated system prompts, this can reduce token spend by 60-90% on cached portions.
LiteLLM maintains a live model_prices_and_context_window.json that can be auto-synced without a proxy restart. When OpenAI released GPT-5.5 in April 2026, LiteLLM had day-zero support. The same applied to Claude Opus 4.8 in May 2026. Set model_cost_map_sync: true in your config and the proxy always knows current pricing and context window sizes.
Any tool built for the OpenAI API works with the LiteLLM proxy without modification. This includes AI coding tools (Claude Code, Cursor, Copilot), agent SDKs (LangChain, LlamaIndex, CrewAI), and observability platforms (Langfuse, Helicone).
The enterprise edition adds SAML/SSO, audit logs, RBAC, per-project budget isolation, max request/response size limits, and team-managed model keys. Available via Docker with a license key; procurement available through AWS and Azure Marketplace.
At its core, LiteLLM translates your request into the correct format for the target provider, sends it, receives the response, and translates it back into a standard OpenAI-compatible format before returning it to your application.
|
|
Your App Requests Core ExecutionThe application pipeline transmits structural runtime commands standardizing operational requests into the abstraction stream. |
|
|
Enterprise Middleware Translation LayerThe primary intelligent proxy logic acts as the centralized gateway gatekeeper executing three vital core operations simultaneously before target delivery:
01 / Dynamic Route
Maps out optimal delivery payloads across OpenAI, Anthropic, Bedrock, Vertex AI, or Mistral runtime endpoints.
02 / Guardrails & Balance
Enforces structural budget verifications, corporate rulesets validation, load balancing, and immediate failover rerouting.
03 / Telemetry Logging
Logs absolute transactional context tracing exact token counts, compute micro-costs, latencies, and team ownership IDs.
|
|
|
Unified Response OutputNormalizes multi-provider down-stream outputs back into an isolated, clean OpenAI structural format before re-entry. |
|
|
Your App Absorbs Standardized PayloadThe verified transaction lifecycle loops out complete. The secure, verified, and costed data asset reaches the application client frame cleanly. |
When you use the proxy, your application never changes its API calls regardless of which provider is serving the request underneath. The proxy handles authentication, routing, retries, and cost logging invisibly.
Install the package:
pip install litellmMake your first unified call:
from litellm import completion
import os
# Set your provider API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
os.environ["AWS_ACCESS_KEY_ID"] = "your-aws-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-aws-secret"
os.environ["AWS_REGION_NAME"] = "us-east-1"
messages = [{"role": "user", "content": "Summarise this contract in 3 bullet points."}]
# Call OpenAI
response = completion(model="gpt-4o", messages=messages)
print(response.choices[0].message.content)
# Switch to Claude with one line change
response = completion(model="claude-opus-4-7-20251101", messages=messages)
print(response.choices[0].message.content)
# Switch to AWS Bedrock
response = completion(model="bedrock/anthropic.claude-opus-4-7", messages=messages)
print(response.choices[0].message.content)Create a config.yaml:
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-opus
litellm_params:
model: anthropic/claude-opus-4-7-20251101
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: bedrock-nova-pro
litellm_params:
model: bedrock/amazon.nova-pro-v1:0
general_settings:
master_key: sk-your-master-key
model_cost_map_sync: trueRun with Docker:
docker run -d \
-p 4000:4000 \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:main-stable \
--config /app/config.yamlYour proxy is now running at http://localhost:4000. Any OpenAI-compatible client points to this URL instead of api.openai.com, with no other changes needed.
"Avoid LiteLLM versions 1.82.7 and 1.82.8, which were affected by a supply chain incident in March 2026. Pin to the latest stable release (1.84.0+)."
Security Note| Factor | LiteLLM Proxy | Direct Provider APIs |
|---|---|---|
| Provider switching | One-line config change | Rewrite API client per provider |
| Cost visibility | Unified dashboard per team/model | Separate billing per provider |
| Budget enforcement | Built-in, blocks before overspend | Manual, discovered after the fact |
| Fallback on errors | Automatic | Custom retry logic per provider |
| Guardrails | Applied at gateway, all providers | Must implement per integration |
| Prompt caching | Tracked and surfaced in dashboard | Provider-specific implementation |
| OpenAI tool compatibility | All tools work unchanged | Direct API only |
LiteLLM makes the most sense when your team uses two or more providers, when you need per-team cost visibility, or when you want guardrails applied consistently without touching application code.
LiteLLM has first-class support for AWS Bedrock, which makes it a natural fit for enterprises already running workloads on AWS. You can route requests to any Bedrock model using standard IAM credentials, with no separate API key management.
from litellm import completion
# Amazon Nova Pro via Bedrock
response = completion(
model="bedrock/amazon.nova-pro-v1:0",
messages=[{"role": "user", "content": "Analyse this freight invoice."}]
)
# Claude Opus 4.7 via Bedrock
response = completion(
model="bedrock/anthropic.claude-opus-4-7-20251101",
messages=[{"role": "user", "content": "Summarise this patient record."}]
)Using LiteLLM with Bedrock lets you A/B test Nova Pro against Claude or GPT-5.4 with no code changes, track costs across all three in one dashboard, and fall back automatically if one model hits a rate limit.
Seaflux has delivered 30+ production AI systems using LiteLLM, AWS Bedrock, and custom agent frameworks. Get a free 30-minute architecture call and walk away with a clear routing and cost strategy.
Talk to a Seaflux engineer →| Provider | Notable Models |
|---|---|
| OpenAI | GPT-5.5, GPT-5.4, GPT-4o, o3 |
| Anthropic | Claude Opus 4.8, Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5 |
| AWS Bedrock | Nova 2, Nova Pro, Llama 4, DeepSeek-R1, all Claude models |
| Google Vertex AI | Gemini 3 Pro, Gemini 3 Flash, Gemma |
| Mistral | Mistral Large 3, Devstral 2, Magistral Small |
| Groq | Llama 4, Mixtral (ultra-low latency) |
| Together AI | Llama 4, Qwen3, DeepSeek |
| Cohere | Command R, Command R+ |
| Hugging Face | Open-weight models via Inference Endpoints |
| Ollama | Any locally hosted model |
| Azure OpenAI | All Azure-deployed OpenAI models |
For the full list, see the LiteLLM official documentation.
At Seaflux, LiteLLM is a standard part of our AI infrastructure stack for production systems. We use it to deliver:
Whether you are evaluating LiteLLM for the first time or looking to move an existing multi-provider setup to a managed proxy architecture, we can help you scope the right approach for your stack.
Let us help you get it right the first time.
Schedule a free consultation with Seaflux →LiteLLM is an open-source Python SDK and self-hosted proxy (AI gateway) that provides a single, OpenAI-compatible interface for 100+ large language model providers including OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Mistral, Groq, and many more. It handles routing, cost tracking, load balancing, fallbacks, and guardrails in one layer.
The LiteLLM proxy is a self-hosted FastAPI server that acts as an AI gateway for your organisation. You run it in your infrastructure via Docker, and all LLM traffic from your applications routes through it. It provides a built-in admin dashboard, virtual key management, per-team budget controls, cost tracking via PostgreSQL, and an OpenAI-compatible API that works with any existing client.
The core LiteLLM SDK and proxy are free and open-source under the MIT licence. The enterprise edition adds SSO, RBAC, audit logs, and per-project budget isolation, with pricing based on deployment size. The open-source version is sufficient for most teams.
LiteLLM saves money in three main ways: it makes it easy to route cheaper models to simpler tasks (e.g. Nova Micro instead of GPT-4o for classification), it surfaces per-team and per-model spend so you can see where budget is going, and it enforces provider budgets so you never exceed a spend cap without knowing.
Yes. LiteLLM has first-class AWS Bedrock support. You authenticate with standard AWS IAM credentials and call any Bedrock model using the bedrock/ prefix in the model string. This includes all Amazon Nova models, Claude on Bedrock, Llama 4, DeepSeek-R1, and others.
The SDK is a Python library you import directly into your application code. It is the right choice for individual developers or simple single-service setups. The proxy is a self-hosted server that sits in front of your applications. It is the right choice for platform teams managing LLM access across multiple services, teams, or customers, especially when you need centralised cost tracking, budget controls, and guardrails.
Yes, with some care around versioning. LiteLLM releases stable builds weekly. Avoid versions 1.82.7 and 1.82.8, which were affected by a supply chain incident in March 2026. Pin to the latest stable release (1.84.0+) and scan dependencies as part of your CI pipeline.
Yes. LiteLLM works with LangChain, LlamaIndex, CrewAI, and other agentic AI frameworks because it exposes an OpenAI-compatible interface. Agent frameworks that are built to call the OpenAI API route through LiteLLM with no modification, gaining cost tracking and fallback logic automatically.
Install with pip install litellm, set your provider API keys as environment variables, and replace your existing openai.chat.completions.create() calls with litellm.completion(). For the proxy, use the Docker setup described in the setup section above. The official docs at docs.litellm.ai cover every provider and feature in detail.

Business Development Executive