GPT is OpenAI's flagship model family. In mid-2026 the frontier is the GPT-5.5 flagship with reasoning built in — it decides when to "think" rather than making you pick a separate model. Below it, GPT-5.4 (plus mini and nano) covers cost-tiered production; o3 remains as a dedicated reasoning model; the older GPT-4.x line is legacy. ~1M context on the GPT-5.x flagship, native multimodal (vision + audio), leading function-calling reliability, hosted via OpenAI Platform and Azure OpenAI Service. The most-used closed frontier model family in the world by deployed count.
GPT (Generative Pre-trained Transformer) is the model family from OpenAI, the AI lab founded in 2015 and primarily known to most people through ChatGPT. The first GPT shipped in 2018; the family has been on near-annual major release cadence since GPT-2 (2019), with GPT-3 (2020), GPT-3.5 (2022), GPT-4 (2023), GPT-4o (2024), and GPT-5 (2025) marking the major generations. The o-series reasoning line (o1 in late 2024, o3 in early 2025) launched separately as the "thinking" complement, before that capability was folded into the GPT-5.x flagship.
Where Claude has historically led on instruction-following and Gemini on raw context length, GPT has historically led on three things: function-calling reliability (the most consistent structured tool dispatch in the field since 2023), multimodal reach (image, audio, video input across the lineup, plus DALL-E and image generation), and API ecosystem depth (more SDKs, more tool integrations, more community examples than any competing family). Those three properties make it the default choice when ecosystem maturity is the deciding factor.
Distribution: GPT is available via the OpenAI Platform directly, via Azure OpenAI Service (including in South Africa North for SA-resident calls), via ChatGPT (the consumer product), and through every major aggregator (OpenRouter, LiteLLM, the LangChain ecosystem). Microsoft co-developed the platform and is the second-largest deployment surface after OpenAI Platform itself.
OpenAI's naming has evolved. The current shape: the GPT-5.x line is the flagship — gpt-5.5 (and gpt-5.5-pro for highest precision) at the top, gpt-5.4 with mini and nano variants for cheaper, higher-volume work. Reasoning is now built into the GPT-5.x line (the model thinks when the task needs it), which has largely absorbed the standalone o-series; o3 remains as a dedicated reasoning model. The older GPT-4.x (4o, 4.1) and chat-only GPT-3.5/GPT-4 are legacy or deprecated — check the API model list before pinning anything older.
The lineup is now segmented within the GPT-5.x family rather than across separate "chat" and "reasoning" lines. GPT-5.5 is the broadest-capability default with reasoning built in; GPT-5.4 and its mini/nano variants sit below it on cost; o3 remains for dedicated reasoning; GPT-4.x is the cheap legacy floor. Most production stacks route across these price points.
The frontier default, with reasoning built in — it allocates "thinking" when the task needs it instead of you switching models. Best for: agentic workflows, code generation, multimodal, long-context reasoning. gpt-5.5-pro is the highest-precision variant for expensive-but-important work. ~1M context.
The volume tier. gpt-5.4 is a strong general-purpose frontier model at roughly half GPT-5.5's price; mini is the price/performance sweet spot for production apps; nano is the budget option for routing, extraction, and high-volume backend tasks.
The dedicated reasoning model — the survivor of the o-series after reasoning was folded into GPT-5.x. It replaced the expensive o1 at a fraction of the cost. Reach for it when you want an explicit reasoning SKU; for most work, GPT-5.x's built-in reasoning is enough.
Function calling, structured outputs (JSON-schema), vision (image input), ~1M context on the flagship, streaming, prompt caching, Batch API. Realtime voice / audio is via the separate Realtime API; image generation is a separate model family.
OpenAI's o-series (o1 in late 2024, o3 in early 2025) pioneered explicit "think-before-answering" reasoning as a separate model line. By mid-2026 that capability is built into the GPT-5.x flagship — the model decides when to spend compute on reasoning, the same convergence Anthropic made with adaptive thinking and Google with Gemini's thinking models. o3 survives as a dedicated reasoning SKU, but you no longer have to route hard work to a separate model by default. The OpenAI Agents SDK still lets you pin a specific model per agent when you want that control.
Honest cross-family positioning. GPT's strengths sit on ecosystem reach, function-calling, and multimodal range; its weaknesses are around the "confidently wrong" failure mode and (for some teams) the OpenAI brand-risk profile. Claude leads on instruction-following; Gemini on context length and search grounding; open-weights on cost and residency.
| Family | Strengths | Watch out for |
|---|---|---|
| GPT (OpenAI) | Largest API ecosystem, best function-calling, broadest multimodal (vision/audio/image-gen), built-in reasoning on GPT-5.x | "Confidently wrong" failure mode more than Claude; brand-risk for some enterprise; Realtime / image-gen / transcription billed separately |
| Claude (Anthropic) | Best instruction-following, adaptive thinking, MCP-native, top tier (Fable 5) leads on long-horizon agentic work; Bedrock af-south-1 endpoint with in-country logs (inference itself routes globally via cross-Region inference — not strict residency) | Closed; USD billing; tighter rate limits than OpenAI on free tier |
| Gemini (Google) | Largest context window (1M+, Gemini 3.1 Pro reportedly to ~2M), native search grounding, voice via Live API, JHB Vertex region | Less consistent on instruction-following; weaker community / ecosystem than OpenAI |
| Llama (Meta) | Open weights, runs locally, largest community fine-tune ecosystem | Frontier gap; instruction-following lags closed frontier; Meta licence has commercial restrictions over 700M MAU |
| Gemma (Google) | Open weights, multimodal, frontier-lab safety tuning, runs locally | Smaller fine-tune ecosystem than Llama; not as code-strong as Qwen |
Most production agent failures aren't reasoning failures — they're tool-use failures. The model picks the wrong function, calls it twice, returns malformed arguments, or skips a required tool. GPT has had the most reliable function-calling in the field since 2023. Claude has caught up on quality but still has more variance on edge cases; Gemini lags both. For agents where structured tool dispatch is the load-bearing capability, GPT's function-calling consistency is a real production advantage.
OpenAI is USD-billed at frontier-tier rates for the GPT-5.x flagship; the mini/nano variants are dramatically cheaper. The production cost levers to wire in: tier routing (don't run everything on GPT-5.5), prompt caching (cached input billed at ~10% of standard on the GPT-5.x family), and Batch API (50% off for non-urgent workloads). Always check openai.com/api/pricing for current numbers — rates below are mid-2026.
List rates, per million tokens (input / output):
$5 / $30. The flagship with built-in reasoning. GPT-5.5 Pro is $30 / $180 for highest-precision, expensive-but-important work.$2.50 / $15. A strong general-purpose frontier model at roughly half GPT-5.5's rate.$0.75 / $4.50. The price/performance pick for most production apps.$0.20 / $1.25. Budget tier for routing, extraction, and high-volume backend work.$2/M input; the dedicated reasoning model, an ~87% cut from the o1 it replaced.Note: data-residency (regional-processing) endpoints carry a ~10% uplift for models released on or after 5 March 2026 — relevant if you pin a region for POPIA.
For SA studios watching FX, all three levers matter. Build a GPT-5.4-mini or nano-driven router as the first agent in the chain. Most of the work (60-80%) doesn't need the GPT-5.5 flagship; route routine traffic to the cheaper tier. Use prompt caching for any RAG or long-system-prompt workload. Push nightly classification / extraction / enrichment to Batch API. This pattern can cut OpenAI costs by 60-80% versus running everything on GPT-5.5. The OpenAI Agents SDK's per-agent model selection makes this trivial — sub-agents on different tiers are a one-liner.
South Africa NorthAzure OpenAI Service in South Africa North (Johannesburg) hosts GPT models with SA data residency, Microsoft enterprise contracting, and POPIA-compliant data handling. For SA banks, insurers, and telcos with existing Microsoft / Azure relationships, this is the structurally clean answer for OpenAI workloads. Caveats: not every model lands in South Africa North at launch — newest GPT releases sometimes lag US East regions by weeks or months. Plan for either a North Europe / UK South fallback or accept the lag if residency is non-negotiable.
For studios without enterprise residency requirements, OpenAI Platform is simpler and cheaper than Azure. Tracing, evals, prompt management on the same Platform dashboard. New models / features land on Platform first, Azure second. The pragmatic SA path: start on Platform for prototypes and pilots; move to Azure SA North if a client requires it.
OpenAI is USD-billed. Same playbook as the Claude leaf: tier routing (GPT-5.4-mini or nano default, GPT-5.5 only for hard cases), prompt caching for RAG workloads, Batch API for nightly. For high-volume routing tasks, also consider hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't structurally need a frontier closed model.
langchain-openai. The right combination when you want explicit graph control on top of OpenAI.South Africa North is the SA-residency-clean path for GPT workloads.