know.2nth.ai Agents LiteLLM
agents · LiteLLM · Skill Leaf

One API. A hundred providers.

LiteLLM is the multi-vendor LLM proxy and SDK. Apache 2.0, Python and TypeScript, built by BerriAI. It maps every major LLM provider's API — OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Together, Groq, Cohere, Mistral, Replicate, and 100+ more — to a single unified OpenAI-compatible interface. Used under the hood by Google ADK, CrewAI, and many other frameworks for cross-vendor model support. Available as a Python library, a TypeScript library, and a hosted proxy server with cost tracking, virtual keys, and rate limiting. The seam where you swap a Claude call for a local Ollama call by changing one config line.

Live · v1.0+ Apache 2.0 100+ providers Library + proxy server

A vendor-neutral abstraction over every LLM provider.

LiteLLM is a Python (and TypeScript) library and proxy server that abstracts the differences between LLM provider APIs. Originated by BerriAI in 2023 and now one of the most-depended-on libraries in the LLM ecosystem — over 13 million downloads per month at the time of writing — LiteLLM solves a structural problem most teams encounter: every LLM provider has a slightly different API shape, and supporting multiple providers means writing per-provider integrations.

The thesis: OpenAI's API shape has effectively become the LLM API standard. Anthropic, Google, AWS Bedrock, Azure OpenAI, every hosted-inference provider, and many open-weights serving stacks (Ollama, vLLM) either natively support it or can be wrapped to look like it. LiteLLM operationalises this: every model call goes through one OpenAI-shaped function, and the library handles the per-provider translation under the hood.

The library ships in two surfaces. Python / TS SDK for direct embedding in your application code — the lightweight integration most agent frameworks use. Proxy server for production deployments where you want centralised cost tracking, virtual API keys, rate limiting, and observability. Many teams start with the SDK and graduate to the proxy as their LLM usage grows.

Where LiteLLM is hidden under the surface

You're probably using LiteLLM whether you know it or not. Google ADK's LiteLlm model class wraps it for non-Gemini models. CrewAI uses it for multi-vendor model support. OpenAI Agents SDK can use it via custom model routing. LangChain's litellm integration plugs every LiteLLM-supported provider into LangGraph. Many internal AI platforms at companies use the LiteLLM proxy as the central gateway for all LLM traffic. The library has become the de facto vendor-abstraction layer for the agent ecosystem.

SDK call or proxy gateway. Same API.

SDK mode — embed LiteLLM directly. The completion() function takes an OpenAI-shaped request, inspects the model string, dispatches to the right provider, returns an OpenAI-shaped response:

# pip install litellm
from litellm import completion

# Same call, four different providers — model string changes
resp = completion(model="gpt-5",                       messages=msgs)
resp = completion(model="claude-sonnet-4-6",          messages=msgs)
resp = completion(model="gemini/gemini-2.5-flash",    messages=msgs)
resp = completion(model="ollama/llama3.3",           messages=msgs)
resp = completion(model="bedrock/anthropic.claude-sonnet-4-6", messages=msgs)

# Response shape is OpenAI's for all of them
print(resp.choices[0].message.content)

Proxy mode — run LiteLLM as a hosted gateway in front of all your LLM traffic. Useful when you want centralised cost tracking, rotating virtual keys, rate limiting per team, fallback chains, and a single audit log:

# Start proxy server
litellm --config config.yaml

# config.yaml routes "production-llm" to whatever you choose,
# with fallback if primary fails:
model_list:
  - model_name: production-llm
    litellm_params:
      model: claude-opus-4-7
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: production-llm  # fallback
    litellm_params:
      model: gpt-5
      api_key: os.environ/OPENAI_API_KEY

# Apps point at proxy as if it's OpenAI:
# OPENAI_API_KEY=sk-litellm-team-1 OPENAI_BASE_URL=http://proxy:4000

The fallback pattern is the killer feature

LiteLLM's proxy supports fallback chains: if the primary provider returns an error, retry on a secondary, then a tertiary. This solves a real production problem — one hour of OpenAI degradation should not take down your customer-facing agent. Configure Claude Sonnet as primary, GPT-4o as fallback, local Ollama-Llama as final fallback, and your agent stays up through any single-provider outage. For SA studios building production agents on USD-billed APIs, the fallback pattern is essential resilience infrastructure.

Where LiteLLM fits in the abstraction layer.

Three things compete for the "multi-provider abstraction" job: LiteLLM (open-source library + self-host proxy), OpenRouter (hosted multiplexer), and rolling-your-own per-provider clients. LiteLLM and OpenRouter are complementary at different levels of the stack, not direct competitors.

SolutionWhat it isBest for
LiteLLM SDKLibrary that translates between provider APIs in your codeAgent frameworks needing multi-vendor support; lightweight integration
LiteLLM ProxySelf-hosted gateway with cost tracking + virtual keys + fallbacksInternal AI platforms; centralising team-wide LLM traffic; resilience
OpenRouterHosted SaaS with one API key for hundreds of providersDon't want to self-host; one billing relationship; pay per token
Vendor SDKs directOpenAI / Anthropic / Google SDKs called directlySingle-vendor stacks; first-party integration features; lowest abstraction overhead

LiteLLM and OpenRouter aren't either-or

LiteLLM is an open-source library you self-host or embed; OpenRouter is a SaaS aggregator. Many production setups use both: LiteLLM proxy for internal-team gateway and routing logic, OpenRouter as one of the upstream providers behind it. LiteLLM can route to OpenRouter for "any model not directly supported" while routing high-volume traffic directly to vendor APIs for lower per-token cost.

Pick LiteLLM when. Skip when.

Use LiteLLM when

  • You support multiple LLM vendors and want unified API surface
  • You need fallback chains for production resilience
  • Centralised cost tracking + virtual keys for team-wide LLM access
  • Building an internal AI platform / gateway
  • You want to swap models in config rather than code
  • Your agent framework needs cross-vendor support (ADK, CrewAI both use LiteLLM)
  • You self-host vLLM or Ollama and want OpenAI-compatible facade for it

Where LiteLLM lands in SA delivery work.

The FX-resilient gateway pattern

For SA studios watching FX exposure on USD-billed LLM APIs, LiteLLM proxy is genuinely useful. Centralise all LLM traffic through one self-hosted proxy; track costs in ZAR-equivalent in real time; configure fallback to local Ollama when USD-billed providers exceed budget thresholds. The proxy lets you implement spending limits, alerts, and per-team quotas without touching application code.

Studio · the multi-vendor router

For SA studios using multiple LLM vendors across client projects, LiteLLM SDK in agent code is the simplest way to keep client-specific model choices centralised. Client A's project uses Claude; client B's uses GPT; client C's uses Gemini. One LiteLLM-aware codebase handles all three by config alone. Fallback chains add resilience for client-facing systems where outages are visible.

Enterprise · the internal AI platform

For SA enterprise standing up internal AI platforms, LiteLLM proxy is one of the strongest open-source options for the gateway layer. Self-hosted in africa-south1 or af-south-1, it provides team-level virtual keys, audit logging, and multi-vendor routing without vendor lock-in. Pair with LangSmith or self-hosted Phoenix for observability above the gateway layer.

Where LiteLLM links in the tree.

Primary sources only.