The most-used closed frontier model.

GPT is OpenAI's flagship model family. Three production lines in 2026: GPT-5 for the broadest capability and the current frontier, the GPT-4.x line (4.1, 4o) still production-current and economical, and the o-series reasoning models (o1, o3, o4) for the hardest reasoning tasks. 200k+ context across the lineup, native multimodal (vision + audio), leading function-calling reliability, hosted via OpenAI Platform and Azure OpenAI Service. The default for production agents on OpenAI infrastructure, and the most-used closed frontier model family in the world by deployed count.

GPT-5 · current frontier Closed model 200k+ context o-series reasoning Native multimodal

01 · What it is

OpenAI's flagship family.

GPT (Generative Pre-trained Transformer) is the model family from OpenAI, the AI lab founded in 2015 and primarily known to most people through ChatGPT. The first GPT shipped in 2018; the family has been on near-annual major release cadence since GPT-2 (2019), with GPT-3 (2020), GPT-3.5 (2022), GPT-4 (2023), GPT-4o (2024), and GPT-5 (2025) marking the major generations. The o-series reasoning line (o1, o3, o4) launched separately in 2024 as the "thinking" complement to the main GPT lineup.

Where Claude has historically led on instruction-following and Gemini on raw context length, GPT has historically led on three things: function-calling reliability (the most consistent structured tool dispatch in the field since 2023), multimodal reach (image, audio, video input across the lineup, plus DALL-E and image generation), and API ecosystem depth (more SDKs, more tool integrations, more community examples than any competing family). Those three properties make it the default choice when ecosystem maturity is the deciding factor.

Distribution: GPT is available via the OpenAI Platform directly, via Azure OpenAI Service (including in South Africa North for SA-resident calls), via ChatGPT (the consumer product), and through every major aggregator (OpenRouter, LiteLLM, the LangChain ecosystem). Microsoft co-developed the platform and is the second-largest deployment surface after OpenAI Platform itself.

Naming · how the family lines up

OpenAI's naming has evolved over time. The current shape: GPT-5 is the flagship "everyday" line; GPT-4.x (4o, 4.1) is the still-supported previous generation, often cheaper and adequate for many production workloads; o-series (o1, o3) are the explicit reasoning models that "think before answering" with longer latency and higher quality on hard problems. Older chat-only models (GPT-3.5, GPT-4) are deprecated or near-deprecated — check the API model list before pinning anything legacy.

02 · The 2026 lineup

Three production lines, different jobs.

The OpenAI family is deliberately segmented. GPT-5 is the broadest-capability default; GPT-4.x sits below it on cost and capability but remains good for high-volume production work; the o-series sits above it on reasoning depth at the cost of latency and price. Most production stacks use two or three lines together via tier routing.

Line · current default

GPT-5

The flagship. Best for: agentic workflows, code generation, multimodal tasks, long-context reasoning. Released 2025; the current production default. Multiple variants (mini, nano) for different price/quality points within the line.

Line · production workhorse

GPT-4.x · 4o, 4.1

The previous-generation line, still actively maintained. GPT-4o is the multimodal default with vision + audio; GPT-4.1 is the latest 4.x text-strong variant. Cheaper than GPT-5 and good enough for the majority of production traffic.

Line · reasoning

o-series · o1, o3, o4

Explicit reasoning models. Spend more compute "thinking" before answering — meaningful quality lift on hard problems (math, code, planning, multi-step proofs) at the cost of latency and price. Use selectively, not by default.

Capability · cross-line

All lines ship

Function calling, JSON mode for structured outputs, vision (image input), 200k+ context, streaming, prompt caching, Batch API. Realtime voice / audio is via the separate Realtime API. DALL-E image generation is a separate model family.

Reasoning models — OpenAI's o-series differentiator

OpenAI introduced the o-series (o1 in late 2024, o3 in early 2025, o4 in 2026) as explicit reasoning models that allocate compute to "thinking" before responding. Hard problems get genuine quality lifts — competitive math benchmarks, complex code synthesis, multi-step planning. The trade-off: latency (10s-60s vs 1-3s for chat), and per-token cost. The OpenAI Agents SDK exposes reasoning models via the same Agent API; you select the model per agent, route easy work to GPT-5 and hard work to o3.

03 · vs Claude, Gemini, Llama, Gemma

Where GPT is the right pick — and where it isn't.

Honest cross-family positioning. GPT's strengths sit on ecosystem reach, function-calling, and multimodal range; its weaknesses are around the "confidently wrong" failure mode and (for some teams) the OpenAI brand-risk profile. Claude leads on instruction-following; Gemini on context length and search grounding; open-weights on cost and residency.

Family	Strengths	Watch out for
GPT (OpenAI)	Largest API ecosystem, best function-calling, broadest multimodal (vision/audio/DALL-E), o-series reasoning	"Confidently wrong" failure mode more than Claude; brand-risk for some enterprise; Realtime / DALL-E / Whisper billed separately
Claude (Anthropic)	Best instruction-following, native extended thinking, MCP-native, Bedrock `af-south-1` for SA residency	Closed; USD billing; tighter rate limits than OpenAI on free tier
Gemini (Google)	Largest context window (1M+ on Pro/Ultra), native search grounding, voice via Live API, JHB Vertex region	Less consistent on instruction-following; weaker community / ecosystem than OpenAI
Llama (Meta)	Open weights, runs locally, largest community fine-tune ecosystem	Frontier gap; instruction-following lags closed frontier; Meta licence has commercial restrictions over 700M MAU
Gemma (Google)	Open weights, multimodal, frontier-lab safety tuning, runs locally	Smaller fine-tune ecosystem than Llama; not as code-strong as Qwen

Why function-calling matters more than benchmark numbers

Most production agent failures aren't reasoning failures — they're tool-use failures. The model picks the wrong function, calls it twice, returns malformed arguments, or skips a required tool. GPT has had the most reliable function-calling in the field since 2023. Claude has caught up on quality but still has more variance on edge cases; Gemini lags both. For agents where structured tool dispatch is the load-bearing capability, GPT's function-calling consistency is a real production advantage.

04 · Pricing reality

Tiered, with three meaningful cost levers.

OpenAI is USD-billed at frontier-tier rates for GPT-5 and o-series. GPT-4.x is meaningfully cheaper. The two production cost levers everyone should wire in are tier routing (don't run everything on GPT-5), prompt caching (added late 2024, ~50-90% discount on repeated context), and Batch API (~50% discount for non-urgent workloads). Always check openai.com/api/pricing for current numbers.

The shape of the pricing curve (illustrative; check the official page):

o-series — most expensive. Reasoning tokens billed alongside output. Use selectively for genuinely hard problems.
GPT-5 — flagship rates, with mini and nano variants at meaningfully lower cost for routine work. Output tokens billed roughly 4-5× input.
GPT-4.x — cheaper than GPT-5. Often the right production default for high-volume workloads where the GPT-5 quality lift isn't structurally needed.
Prompt caching — up to 90% discount on cached input tokens. Significant cost saver for RAG agents that share long system prompts or repeat document context.
Batch API — 50% discount on input + output for batched non-urgent workloads. Great for nightly processing.

The three-lever routing pattern

For SA studios watching FX, all three levers matter. Build a GPT-4.x or GPT-5-mini-driven router as the first agent in the chain. Most of the work (60-80%) doesn't need GPT-5 flagship; route routine traffic to the cheaper tier. Use prompt caching for any RAG or long-system-prompt workload. Push nightly classification / extraction / enrichment to Batch API. This pattern can cut OpenAI costs by 60-80% versus running everything on GPT-5. The OpenAI Agents SDK's per-agent model selection makes this trivial — sub-agents on different tiers are a one-liner.

05 · Decision guide

When GPT is the right model.

Use GPT when

Function-calling reliability is load-bearing
You need broad multimodal range (vision + audio + image gen)
You're already on Azure or want Microsoft enterprise contracting
Reasoning workloads benefit from o-series explicit thinking
Ecosystem maturity matters — you want the largest community + integrations
You need DALL-E image generation alongside text
You're standardised on OpenAI / Azure for enterprise compliance posture

Skip when

Instruction-following reliability matters more than tool dispatch — Claude often wins
You need 1M+ context windows — Gemini Pro / Ultra leads
POPIA / data-residency requires SA-resident inference and you can't use Azure SA North
You want fully open / self-hostable models — GPT is closed
Cost-sensitive at extreme scale — open-weights via Ollama is structurally cheaper
You distrust OpenAI as a brand or vendor — Claude or Gemini fit better politically

06 · South African context

Where GPT lands in SA delivery work.

Enterprise · Azure OpenAI in `South Africa North`

Azure OpenAI Service in South Africa North (Johannesburg) hosts GPT models with SA data residency, Microsoft enterprise contracting, and POPIA-compliant data handling. For SA banks, insurers, and telcos with existing Microsoft / Azure relationships, this is the structurally clean answer for OpenAI workloads. Caveats: not every model lands in South Africa North at launch — newest GPT releases sometimes lag US East regions by weeks or months. Plan for either a North Europe / UK South fallback or accept the lag if residency is non-negotiable.

Studio · OpenAI Platform direct

For studios without enterprise residency requirements, OpenAI Platform is simpler and cheaper than Azure. Tracing, evals, prompt management on the same Platform dashboard. New models / features land on Platform first, Azure second. The pragmatic SA path: start on Platform for prototypes and pilots; move to Azure SA North if a client requires it.

FX exposure mitigations

OpenAI is USD-billed. Same playbook as the Claude leaf: tier routing (GPT-4.x or GPT-5-mini default, GPT-5 / o-series only for hard cases), prompt caching for RAG workloads, Batch API for nightly. For high-volume routing tasks, also consider hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't structurally need a frontier closed model.

07 · Connections