The most-used closed frontier model.

GPT is OpenAI's flagship model family. In mid-2026 the frontier is the GPT-5.5 flagship with reasoning built in — it decides when to "think" rather than making you pick a separate model. Below it, GPT-5.4 (plus mini and nano) covers cost-tiered production; o3 remains as a dedicated reasoning model; the older GPT-4.x line is legacy. ~1M context on the GPT-5.x flagship, native multimodal (vision + audio), leading function-calling reliability, hosted via OpenAI Platform and Azure OpenAI Service. The most-used closed frontier model family in the world by deployed count.

GPT-5.5 · current frontier Closed model ~1M context Built-in reasoning Native multimodal

01 · What it is

OpenAI's flagship family.

GPT (Generative Pre-trained Transformer) is the model family from OpenAI, the AI lab founded in 2015 and primarily known to most people through ChatGPT. The first GPT shipped in 2018; the family has been on near-annual major release cadence since GPT-2 (2019), with GPT-3 (2020), GPT-3.5 (2022), GPT-4 (2023), GPT-4o (2024), and GPT-5 (2025) marking the major generations. The o-series reasoning line (o1 in late 2024, o3 in early 2025) launched separately as the "thinking" complement, before that capability was folded into the GPT-5.x flagship.

Where Claude has historically led on instruction-following and Gemini on raw context length, GPT has historically led on three things: function-calling reliability (the most consistent structured tool dispatch in the field since 2023), multimodal reach (image, audio, video input across the lineup, plus DALL-E and image generation), and API ecosystem depth (more SDKs, more tool integrations, more community examples than any competing family). Those three properties make it the default choice when ecosystem maturity is the deciding factor.

Distribution: GPT is available via the OpenAI Platform directly, via Azure OpenAI Service (including in South Africa North for SA-resident calls), via ChatGPT (the consumer product), and through every major aggregator (OpenRouter, LiteLLM, the LangChain ecosystem). Microsoft co-developed the platform and is the second-largest deployment surface after OpenAI Platform itself.

Naming · how the family lines up

OpenAI's naming has evolved. The current shape: the GPT-5.x line is the flagship — gpt-5.5 (and gpt-5.5-pro for highest precision) at the top, gpt-5.4 with mini and nano variants for cheaper, higher-volume work. Reasoning is now built into the GPT-5.x line (the model thinks when the task needs it), which has largely absorbed the standalone o-series; o3 remains as a dedicated reasoning model. The older GPT-4.x (4o, 4.1) and chat-only GPT-3.5/GPT-4 are legacy or deprecated — check the API model list before pinning anything older.

02 · The 2026 lineup

One flagship line, cost-tiered.

The lineup is now segmented within the GPT-5.x family rather than across separate "chat" and "reasoning" lines. GPT-5.5 is the broadest-capability default with reasoning built in; GPT-5.4 and its mini/nano variants sit below it on cost; o3 remains for dedicated reasoning; GPT-4.x is the cheap legacy floor. Most production stacks route across these price points.

Line · current flagship

GPT-5.5 · 5.5 Pro

The frontier default, with reasoning built in — it allocates "thinking" when the task needs it instead of you switching models. Best for: agentic workflows, code generation, multimodal, long-context reasoning. gpt-5.5-pro is the highest-precision variant for expensive-but-important work. ~1M context.

Line · cost-tiered

GPT-5.4 · mini, nano

The volume tier. gpt-5.4 is a strong general-purpose frontier model at roughly half GPT-5.5's price; mini is the price/performance sweet spot for production apps; nano is the budget option for routing, extraction, and high-volume backend tasks.

Line · reasoning

o3

The dedicated reasoning model — the survivor of the o-series after reasoning was folded into GPT-5.x. It replaced the expensive o1 at a fraction of the cost. Reach for it when you want an explicit reasoning SKU; for most work, GPT-5.x's built-in reasoning is enough.

Capability · cross-line

All lines ship

Function calling, structured outputs (JSON-schema), vision (image input), ~1M context on the flagship, streaming, prompt caching, Batch API. Realtime voice / audio is via the separate Realtime API; image generation is a separate model family.

Reasoning, now built into the flagship

OpenAI's o-series (o1 in late 2024, o3 in early 2025) pioneered explicit "think-before-answering" reasoning as a separate model line. By mid-2026 that capability is built into the GPT-5.x flagship — the model decides when to spend compute on reasoning, the same convergence Anthropic made with adaptive thinking and Google with Gemini's thinking models. o3 survives as a dedicated reasoning SKU, but you no longer have to route hard work to a separate model by default. The OpenAI Agents SDK still lets you pin a specific model per agent when you want that control.

03 · vs Claude, Gemini, Llama, Gemma

Where GPT is the right pick — and where it isn't.

Honest cross-family positioning. GPT's strengths sit on ecosystem reach, function-calling, and multimodal range; its weaknesses are around the "confidently wrong" failure mode and (for some teams) the OpenAI brand-risk profile. Claude leads on instruction-following; Gemini on context length and search grounding; open-weights on cost and residency.

Family	Strengths	Watch out for
GPT (OpenAI)	Largest API ecosystem, best function-calling, broadest multimodal (vision/audio/image-gen), built-in reasoning on GPT-5.x	"Confidently wrong" failure mode more than Claude; brand-risk for some enterprise; Realtime / image-gen / transcription billed separately
Claude (Anthropic)	Best instruction-following, adaptive thinking, MCP-native, top tier (Fable 5) leads on long-horizon agentic work; Bedrock `af-south-1` endpoint with in-country logs (inference itself routes globally via cross-Region inference — not strict residency)	Closed; USD billing; tighter rate limits than OpenAI on free tier
Gemini (Google)	Largest context window (1M+, Gemini 3.1 Pro reportedly to ~2M), native search grounding, voice via Live API, JHB Vertex region	Less consistent on instruction-following; weaker community / ecosystem than OpenAI
Llama (Meta)	Open weights, runs locally, largest community fine-tune ecosystem	Frontier gap; instruction-following lags closed frontier; Meta licence has commercial restrictions over 700M MAU
Gemma (Google)	Open weights, multimodal, frontier-lab safety tuning, runs locally	Smaller fine-tune ecosystem than Llama; not as code-strong as Qwen

Why function-calling matters more than benchmark numbers

Most production agent failures aren't reasoning failures — they're tool-use failures. The model picks the wrong function, calls it twice, returns malformed arguments, or skips a required tool. GPT has had the most reliable function-calling in the field since 2023. Claude has caught up on quality but still has more variance on edge cases; Gemini lags both. For agents where structured tool dispatch is the load-bearing capability, GPT's function-calling consistency is a real production advantage.

04 · Pricing reality

Tiered, with three meaningful cost levers.

OpenAI is USD-billed at frontier-tier rates for the GPT-5.x flagship; the mini/nano variants are dramatically cheaper. The production cost levers to wire in: tier routing (don't run everything on GPT-5.5), prompt caching (cached input billed at ~10% of standard on the GPT-5.x family), and Batch API (50% off for non-urgent workloads). Always check openai.com/api/pricing for current numbers — rates below are mid-2026.

List rates, per million tokens (input / output):

GPT-5.5 — $5 / $30. The flagship with built-in reasoning. GPT-5.5 Pro is $30 / $180 for highest-precision, expensive-but-important work.
GPT-5.4 — $2.50 / $15. A strong general-purpose frontier model at roughly half GPT-5.5's rate.
GPT-5.4 mini — $0.75 / $4.50. The price/performance pick for most production apps.
GPT-5.4 nano — $0.20 / $1.25. Budget tier for routing, extraction, and high-volume backend work.
o3 — ~$2/M input; the dedicated reasoning model, an ~87% cut from the o1 it replaced.
Prompt caching — cached input billed at ~10% of standard (e.g. GPT-5.5 input $5 → $0.50 cached). Big saver for RAG / repeated context.
Batch API — 50% off input + output for non-urgent workloads.

Note: data-residency (regional-processing) endpoints carry a ~10% uplift for models released on or after 5 March 2026 — relevant if you pin a region for POPIA.

The three-lever routing pattern

For SA studios watching FX, all three levers matter. Build a GPT-5.4-mini or nano-driven router as the first agent in the chain. Most of the work (60-80%) doesn't need the GPT-5.5 flagship; route routine traffic to the cheaper tier. Use prompt caching for any RAG or long-system-prompt workload. Push nightly classification / extraction / enrichment to Batch API. This pattern can cut OpenAI costs by 60-80% versus running everything on GPT-5.5. The OpenAI Agents SDK's per-agent model selection makes this trivial — sub-agents on different tiers are a one-liner.

05 · Decision guide

When GPT is the right model.

Use GPT when

Function-calling reliability is load-bearing
You need broad multimodal range (vision + audio + image gen)
You're already on Azure or want Microsoft enterprise contracting
Reasoning workloads benefit from o-series explicit thinking
Ecosystem maturity matters — you want the largest community + integrations
You need DALL-E image generation alongside text
You're standardised on OpenAI / Azure for enterprise compliance posture

Skip when

Instruction-following reliability matters more than tool dispatch — Claude often wins
You need the largest context windows — Gemini 3.x leads (1M+, reportedly to ~2M on the Pro tier)
POPIA / data-residency requires SA-resident inference and you can't use Azure SA North
You want fully open / self-hostable models — GPT is closed
Cost-sensitive at extreme scale — open-weights via Ollama is structurally cheaper
You distrust OpenAI as a brand or vendor — Claude or Gemini fit better politically

06 · South African context

Where GPT lands in SA delivery work.

Enterprise · Azure OpenAI in `South Africa North`

Azure OpenAI Service in South Africa North (Johannesburg) hosts GPT models with SA data residency, Microsoft enterprise contracting, and POPIA-compliant data handling. For SA banks, insurers, and telcos with existing Microsoft / Azure relationships, this is the structurally clean answer for OpenAI workloads. Caveats: not every model lands in South Africa North at launch — newest GPT releases sometimes lag US East regions by weeks or months. Plan for either a North Europe / UK South fallback or accept the lag if residency is non-negotiable.

Studio · OpenAI Platform direct

For studios without enterprise residency requirements, OpenAI Platform is simpler and cheaper than Azure. Tracing, evals, prompt management on the same Platform dashboard. New models / features land on Platform first, Azure second. The pragmatic SA path: start on Platform for prototypes and pilots; move to Azure SA North if a client requires it.

FX exposure mitigations

OpenAI is USD-billed. Same playbook as the Claude leaf: tier routing (GPT-5.4-mini or nano default, GPT-5.5 only for hard cases), prompt caching for RAG workloads, Batch API for nightly. For high-volume routing tasks, also consider hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't structurally need a frontier closed model.

07 · Connections