And a large language model is just a brain. A brain in a jar — no hands, no memory, no eyes, no way to check whether it was right. Everything useful an "AI" does is software wrapped around the model: the tools it can call, the context it's given, the loop that runs it, and the version control that lets its work compound. Get the software right and the model earns its keep. Get it wrong and the smartest model on the market produces an expensive demo.
There's a story that AI is a new kind of thing — a colleague, an oracle, an autonomous worker you hire. It sells well and it's wrong in the way that matters. An AI product is software that calls a model. The model is one component: a remarkable one, but a component. The reliability, the safety, the usefulness, the cost — all of those live in the software around it, the part teams have been building and disciplining for fifty years.
This isn't a downgrade of AI; it's a clarification of where the work is. The same engineering practices that make any software trustworthy — version control, code review, tests, deployment gates, observability — are exactly what make AI trustworthy. There is no separate "AI track" that gets to skip them. The teams who treat agents as software ship things that work; the teams who treat them as prompt-tweaking ship things that demo.
So this branch of the tree is the foundation under all the others. Before agents, before data, before any domain turns a model into work, there's the craft: how you write code, how you version it, how you review it, and how you ship it.
Mechanically, a large language model is a stateless function: you pass in text, it returns the most probable next text, and then it forgets everything. That's the whole interface. Everything it appears to "do" beyond producing text is something the surrounding software arranged.
Nothing persists between requests. "Memory" is software re-sending the history, or writing notes to a file or database and reading them back.
A model can't send an email, query a database, or edit a file. It can only emit text. Tools (and the code that executes them) are what turn that text into action.
It has no access to your codebase, your CRM, or today's date unless software fetches that context and puts it in the prompt.
It predicts plausible text, not verified fact. Tests, types, schemas, and human review are the software that decides whether the output was actually right.
None of this diminishes the model. A brain is the hard part to build, and modern models are extraordinary at the thing they do. But a brain on its own doesn't file a tax return or fix a bug — it needs a body, senses, memory, and a way to learn from what worked. In software terms: tools, context, state, and version control.
The industry word for the software around the model is the harness: the loop, the tools, the context management, the guardrails. It's the difference between a chatbot and an agent that does work.
A useful agent is a fairly ordinary piece of software with a model in the middle of the loop. It gathers context, asks the model what to do next, executes a tool the model chose, feeds the result back, and repeats until the job is done or a guardrail stops it. Every one of those steps — the tool definitions, the permission checks, the retries, the logging — is code you write and version like any other.
The Agents domain states the bar directly: "agents are software, not prompt soup — code-first definitions, version control, unit tests, evaluation harnesses, CI/CD." Tools (function calling, MCP) are how the brain acts on the world. Agent Skills are reusable instructions shipped as version-controlled files the agent reads when relevant. All three are software constructs. None of them is the model.
This is why "which model?" is rarely the question that decides whether a project works. Swap a good model for a slightly better one and a well-built harness gets a little better. Put the best model on the market inside a harness with no tests, no version control, and no observability, and you get something impressive that nobody can trust in production.
If software is the body, version control is its nervous system and memory — the record of every change, who made it, and why. It's also what lets work compound instead of drift, which is the principle the entire 2nth tree is built on.
Every change to code — or to a prompt, a skill, an agent definition, a finance model's rules — lands as a commit: a small, attributable, reversible unit of change. Branches let you try things without breaking what works. Pull requests put a human (or another agent) in the loop before a change ships. History means you can answer "what changed, and why?" months later. That audit trail is the difference between a system you can operate and one you just hope keeps working.
It matters doubly for AI. An agent that can write code is only safe if its work goes through the same gate everyone else's does: a branch, a diff a human can read, tests in CI, a reviewable pull request. Version control is what makes an autonomous change auditable and reversible — the safety net that lets you give an agent real leverage without giving it the keys.
The rest of the tree already lives this: finance consolidation rules are code in Git so a close is repeatable; business process models are reviewed in PRs and tested in CI; design decisions are a DESIGN.md committed to a repo. When knowledge is in version control, every node builds on the last instead of starting over. That's the 2ⁿ effect — capability that accumulates, not pilots that reset.
If AI is software, then deploying AI is a software project, and it succeeds or fails on software fundamentals. The uncomfortable, useful consequence: most of what separates an AI pilot that stalls from one that compounds has nothing to do with the model. It's whether the work is in version control, whether there are tests, whether changes are reviewed, whether you can see what the system did and roll it back.
Is the agent's behaviour defined in code and committed to a repo, or living in a chat window? Can you reproduce a result from last week? Is there a test that fails when the agent regresses? Does a change reach production through a reviewable pull request? Can you see what it did and undo it? A vendor who can answer these is selling engineering. A vendor who waves them away is selling a demo.
This is the honest pitch, and it's 2nth's: we don't sell magic. We bring the software discipline — version control, repositories, review, CI — that turns a capable model into a system you can actually run, audit, and improve.
For SA banks, insurers, and the public sector, "AI is just software" is the framing that survives a compliance review. POPIA, FSCA, and internal audit all ask the same thing of any system touching regulated data: what did it do, who authorised it, and can you prove it? Version control answers that natively — every change attributable, every deployment reviewable, every result reproducible. An AI system built as disciplined software clears that bar; one built as a prompt in a chat window does not.
It also right-sizes the conversation. South African teams don't need to wait for a frontier breakthrough to get value from AI — they need the same software engineering they'd apply to any production system, pointed at a model. That's a capability most teams already have or can build, not a moonshot.