LiteLLM is the multi-vendor LLM proxy and SDK. Apache 2.0, Python and TypeScript, built by BerriAI. It maps every major LLM provider's API — OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Together, Groq, Cohere, Mistral, Replicate, and 100+ more — to a single unified OpenAI-compatible interface. Used under the hood by Google ADK, CrewAI, and many other frameworks for cross-vendor model support. Available as a Python library, a TypeScript library, and a hosted proxy server with cost tracking, virtual keys, and rate limiting. The seam where you swap a Claude call for a local Ollama call by changing one config line.
LiteLLM is a Python (and TypeScript) library and proxy server that abstracts the differences between LLM provider APIs. Originated by BerriAI in 2023 and now one of the most-depended-on libraries in the LLM ecosystem — over 13 million downloads per month at the time of writing — LiteLLM solves a structural problem most teams encounter: every LLM provider has a slightly different API shape, and supporting multiple providers means writing per-provider integrations.
The thesis: OpenAI's API shape has effectively become the LLM API standard. Anthropic, Google, AWS Bedrock, Azure OpenAI, every hosted-inference provider, and many open-weights serving stacks (Ollama, vLLM) either natively support it or can be wrapped to look like it. LiteLLM operationalises this: every model call goes through one OpenAI-shaped function, and the library handles the per-provider translation under the hood.
The library ships in two surfaces. Python / TS SDK for direct embedding in your application code — the lightweight integration most agent frameworks use. Proxy server for production deployments where you want centralised cost tracking, virtual API keys, rate limiting, and observability. Many teams start with the SDK and graduate to the proxy as their LLM usage grows.
You're probably using LiteLLM whether you know it or not. Google ADK's LiteLlm model class wraps it for non-Gemini models. CrewAI uses it for multi-vendor model support. OpenAI Agents SDK can use it via custom model routing. LangChain's litellm integration plugs every LiteLLM-supported provider into LangGraph. Many internal AI platforms at companies use the LiteLLM proxy as the central gateway for all LLM traffic. The library has become the de facto vendor-abstraction layer for the agent ecosystem.
SDK mode — embed LiteLLM directly. The completion() function takes an OpenAI-shaped request, inspects the model string, dispatches to the right provider, returns an OpenAI-shaped response:
# pip install litellm from litellm import completion # Same call, four different providers — model string changes resp = completion(model="gpt-5", messages=msgs) resp = completion(model="claude-sonnet-4-6", messages=msgs) resp = completion(model="gemini/gemini-2.5-flash", messages=msgs) resp = completion(model="ollama/llama3.3", messages=msgs) resp = completion(model="bedrock/anthropic.claude-sonnet-4-6", messages=msgs) # Response shape is OpenAI's for all of them print(resp.choices[0].message.content)
Proxy mode — run LiteLLM as a hosted gateway in front of all your LLM traffic. Useful when you want centralised cost tracking, rotating virtual keys, rate limiting per team, fallback chains, and a single audit log:
# Start proxy server litellm --config config.yaml # config.yaml routes "production-llm" to whatever you choose, # with fallback if primary fails: model_list: - model_name: production-llm litellm_params: model: claude-opus-4-7 api_key: os.environ/ANTHROPIC_API_KEY - model_name: production-llm # fallback litellm_params: model: gpt-5 api_key: os.environ/OPENAI_API_KEY # Apps point at proxy as if it's OpenAI: # OPENAI_API_KEY=sk-litellm-team-1 OPENAI_BASE_URL=http://proxy:4000
LiteLLM's proxy supports fallback chains: if the primary provider returns an error, retry on a secondary, then a tertiary. This solves a real production problem — one hour of OpenAI degradation should not take down your customer-facing agent. Configure Claude Sonnet as primary, GPT-4o as fallback, local Ollama-Llama as final fallback, and your agent stays up through any single-provider outage. For SA studios building production agents on USD-billed APIs, the fallback pattern is essential resilience infrastructure.
Three things compete for the "multi-provider abstraction" job: LiteLLM (open-source library + self-host proxy), OpenRouter (hosted multiplexer), and rolling-your-own per-provider clients. LiteLLM and OpenRouter are complementary at different levels of the stack, not direct competitors.
| Solution | What it is | Best for |
|---|---|---|
| LiteLLM SDK | Library that translates between provider APIs in your code | Agent frameworks needing multi-vendor support; lightweight integration |
| LiteLLM Proxy | Self-hosted gateway with cost tracking + virtual keys + fallbacks | Internal AI platforms; centralising team-wide LLM traffic; resilience |
| OpenRouter | Hosted SaaS with one API key for hundreds of providers | Don't want to self-host; one billing relationship; pay per token |
| Vendor SDKs direct | OpenAI / Anthropic / Google SDKs called directly | Single-vendor stacks; first-party integration features; lowest abstraction overhead |
LiteLLM is an open-source library you self-host or embed; OpenRouter is a SaaS aggregator. Many production setups use both: LiteLLM proxy for internal-team gateway and routing logic, OpenRouter as one of the upstream providers behind it. LiteLLM can route to OpenRouter for "any model not directly supported" while routing high-volume traffic directly to vendor APIs for lower per-token cost.
For SA studios watching FX exposure on USD-billed LLM APIs, LiteLLM proxy is genuinely useful. Centralise all LLM traffic through one self-hosted proxy; track costs in ZAR-equivalent in real time; configure fallback to local Ollama when USD-billed providers exceed budget thresholds. The proxy lets you implement spending limits, alerts, and per-team quotas without touching application code.
For SA studios using multiple LLM vendors across client projects, LiteLLM SDK in agent code is the simplest way to keep client-specific model choices centralised. Client A's project uses Claude; client B's uses GPT; client C's uses Gemini. One LiteLLM-aware codebase handles all three by config alone. Fallback chains add resilience for client-facing systems where outages are visible.
For SA enterprise standing up internal AI platforms, LiteLLM proxy is one of the strongest open-source options for the gateway layer. Self-hosted in africa-south1 or af-south-1, it provides team-level virtual keys, audit logging, and multi-vendor routing without vendor lock-in. Pair with LangSmith or self-hosted Phoenix for observability above the gateway layer.
LiteLlm("ollama/gemma3:4b") is the canonical pattern.