One key. Every model worth using.

OpenRouter is the hosted multiplexer over hundreds of LLMs from dozens of providers. One API key, one billing relationship, every frontier model — Claude, GPT, Gemini, Llama, Mistral, Qwen, Gemma, DeepSeek, Cohere, Grok, and many more — behind a single OpenAI-compatible API surface. Competitive per-token pricing across providers, transparent routing across upstream availability, real-time quality and price tracking. The pragmatic choice for teams that want frontier model flexibility without managing N billing relationships, and the daily-driver tool for prototyping work where "try this prompt against five different models" is the workflow.

Live · production-ready Hosted SaaS 300+ models · 60+ providers OpenAI-compatible API

01 · What it is

A hosted aggregator that solves the multi-vendor problem.

OpenRouter is a hosted SaaS that aggregates LLM access across hundreds of models from dozens of upstream providers into a single API. Founded in 2023, OpenRouter has become one of the most-used aggregation platforms in the LLM ecosystem — particularly in dev / prototyping / hobbyist contexts where the cost of opening accounts at every individual provider would dominate the actual model spend.

The thesis: most teams want to compare or switch between models more often than they want to manage individual provider relationships. Opening accounts at OpenAI, Anthropic, Google, Cohere, Together AI, Groq, Fireworks, Perplexity, X.ai, DeepSeek, Mistral, and a dozen smaller providers is more friction than the work it enables. OpenRouter abstracts the friction: one signup, one payment method, one API key, one usage dashboard, and you're talking to all of them through a single OpenAI-shaped endpoint.

The economics work because OpenRouter takes a margin on the upstream cost. For most users this is invisible — OpenRouter's pricing is competitive with going direct, and for some models OpenRouter's volume aggregation actually unlocks lower prices than individual users would get direct. Free tier exists with rate-limited access to many models including frontier-tier (useful for prototyping). Paid tier is pay-per-token with no monthly minimum. BYOK (Bring Your Own Key) lets you use your own provider account for specific models while still routing through OpenRouter's API surface.

Where OpenRouter fits in the hosting stack

OpenRouter is one layer up from individual hosted providers. The stack: model weights live with model authors (Meta, Anthropic, Google, etc.); inference happens on hosted providers (OpenAI, Anthropic, Bedrock, Together AI, Groq, etc.); aggregators (OpenRouter) sit on top of providers; agent frameworks (LangGraph, ADK, CrewAI) sit on top of either aggregators or direct provider APIs. OpenRouter exists because the layer between "dozens of providers" and "your app" was structurally underserved.

02 · How it works

OpenAI-shaped API. Model strings carry the routing.

Using OpenRouter from agent code is identical to using OpenAI — the only changes are the base URL and the model string. The model string format is provider/model-name, e.g. anthropic/claude-opus-4.7 or meta-llama/llama-3.3-70b-instruct:

# Use OpenRouter from any OpenAI-SDK-speaking code
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)

# Same code, different upstream — model string changes
resp = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hi"}],
)

# Or:
#   "openai/gpt-5"
#   "google/gemini-2.5-pro"
#   "meta-llama/llama-3.3-70b-instruct"
#   "qwen/qwen-2.5-72b-instruct"
#   "deepseek/deepseek-chat"

Provider routing for open-weights models. For a model that multiple upstream providers host (e.g. Llama 70B on Together / Fireworks / DeepInfra / Groq), OpenRouter automatically routes to the cheapest available upstream. You can override with explicit provider preference if you care about specific characteristics (Groq for low latency, Together for throughput):

# Force a specific upstream provider
resp = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=msgs,
    extra_body={
        "provider": {"order": ["Groq", "Together"]},
    },
)

The transparency dashboard

OpenRouter's site (openrouter.ai/models) shows real-time per-model leaderboards: price ranking, throughput ranking, latency ranking, and most-used-this-week. This is genuinely useful for "which model should I evaluate next" decisions, particularly for teams trying to understand what the broader ecosystem is shipping. The free tier and unified billing make it cheap to actually try the recommendations.

03 · vs LiteLLM, vendor SDKs

Where OpenRouter fits and where LiteLLM fits.

OpenRouter and LiteLLM both abstract over multiple LLM providers, but they make different bets. OpenRouter is hosted SaaS — you sign up, pay per token, get one API. LiteLLM is open-source library / proxy — you embed it or self-host. They're often used together in the same stack.

Dimension	OpenRouter	LiteLLM
Form factor	Hosted SaaS	Open-source library or self-hosted proxy
Setup cost	Sign up, paste API key	Install package or run proxy
Billing model	One bill from OpenRouter (BYOK optional)	You manage all upstream billing
Provider coverage	300+ models, 60+ providers	100+ providers
Cost-routing	Automatic across upstreams	Configure manually with fallback chains
Self-hosting	No	Yes (proxy mode)
Data residency	Routes through OpenRouter (US-hosted)	Self-host wherever you want
Best for	Prototyping, dev, single-bill simplicity, model exploration	Internal AI platforms, central gateway, residency-controlled production

The combine-them pattern

Many production setups use both. OpenRouter as one upstream behind a self-hosted LiteLLM proxy. The proxy gives you cost tracking, virtual keys, and residency control; OpenRouter gives you broad model coverage as a single upstream. Internal teams hit the proxy; the proxy routes high-volume traffic direct-to-vendor (cheapest per-token), routes long-tail / one-off requests through OpenRouter (lowest setup friction), routes residency-sensitive traffic to specific regional providers.

04 · Decision guide

Pick OpenRouter when. Skip when.

Use OpenRouter when

You want one bill across many providers
Prototyping and model exploration is your daily workflow
You don't want to manage individual provider account relationships
Free-tier access to frontier models accelerates evaluation
Your volume is small enough that hosted aggregator economics work
You want a simple way to A/B prompt across models for benchmarking
You need access to long-tail models without negotiating individual provider deals

Skip when

POPIA / data residency forbids US-hosted aggregator routing
You're high-volume single-vendor — direct provider account is cheaper
You want full control of billing and rate-limiting per upstream — LiteLLM proxy fits
You need vendor-specific features that OpenRouter abstracts away (Anthropic prompt caching, OpenAI Realtime API, etc.)
Tight latency requirements where any aggregation overhead matters
You're already on Bedrock or Vertex with enterprise contracts and don't need additional providers

05 · South African context

Where OpenRouter lands in SA delivery work.

Studio · the prototyping accelerator

For SA studios, OpenRouter is genuinely valuable for prototyping and client demos. One $20 USD top-up unlocks dozens of frontier models for evaluation work; the free tier covers low-volume hobbyist use entirely. When pitching agent projects to clients, being able to demo "your agent answer on Claude vs GPT vs Gemini" in 30 seconds — without standing up three separate provider accounts — meaningfully shortens the sales cycle.

POPIA caution

OpenRouter routes traffic through their US-hosted infrastructure. For POPIA-sensitive workloads carrying personal information, OpenRouter introduces a Section 72 cross-border transfer that wouldn't otherwise exist if you went direct to a regional provider. For non-PII workloads (research, prototyping, internal tooling without customer data), this is a non-issue. For customer-data-bearing production, prefer regional residency paths: AWS Bedrock af-south-1, Vertex AI africa-south1, Azure OpenAI South Africa North, or self-hosted vLLM.

FX exposure

OpenRouter is USD-billed like everything else. The aggregator margin is small enough that going through OpenRouter doesn't materially change FX cost vs going direct, and the operational simplicity of one bill vs N bills is genuinely worth the small markup for most studios. The pattern that wins for SA studios watching FX: OpenRouter for evaluation and low-volume work, direct vendor accounts for high-volume production traffic on specific models, regional cloud (Bedrock / Vertex) for residency-sensitive workloads.

06 · Connections