Strip the ritual and every agile ceremony solves one problem: moving context from one head to another. Agents are stateless — they need that transfer more than humans do, on every task. So nothing about planning is obsolete; it just compiled down from meetings into files. The modern backlog stops being a to-do list and becomes executable specification under version control: the spec is the ticket, the ticket is the prompt, and the operator's job is keeping the queue full of well-specified, independently shippable work.
Sprint planning scopes work before a line of code is written. Backlog refinement chunks a big idea into pieces one worker can hold in their head at once. A PRD gets the picture out of one brain and into a format someone else can execute. Look past the ceremony and agile has always been a set of context-transfer mechanisms — disciplines for getting what's in one person's head into another's before work starts. The practitioner argument that named this cleanly is Josh Owens' “Your AI Doesn't Need Better Prompts. It Needs a Sprint Planning Session.” (Feb 2026).
Here is the reframe the whole leaf turns on: an agent is stateless. It holds nothing between sessions, and it starts every task cold. Where a human teammate accumulates context over months and can coast on it, an agent needs the full transfer on every single task. So the planning disciplines aren't obsolete in the agent era — they're load-bearing in a way they never quite were for humans. What changes is the medium: the context that used to live in a two-hour meeting and three people's memory now has to live in files an agent can read. Planning didn't die. It compiled down from meetings into artifacts under version control.
A ceremony is a context-transfer protocol with a human runtime. Give it an agent runtime and the protocol survives — it just runs against committed text instead of a room full of people. The rest of this leaf is what each ceremony becomes once you write it down for a machine.
In spec-driven development (SDD), a versioned, structured specification — not the code, not the Jira card — is the source of truth, and code is generated and maintained against it. It exists to answer three specific failure modes of hand-a-model-a-prompt development.
Those three failure modes: intent drift (the code slowly stops matching what anyone actually asked for), context decay (the “why” behind a decision evaporates the moment the session ends), and unverifiable output (you can't tell whether the agent did the right thing because “the right thing” was never written down). A working ticket in this model carries what a good agent brief always carried: outcomes, scope boundaries, constraints, prior decisions, a task breakdown, and verification criteria. That is not new project-management theology — it's the same brief a competent contractor would demand before quoting, made executable.
| Tool | What it is | Since |
|---|---|---|
| GitHub Spec Kit | Open-source toolkit for spec-driven development against coding agents | Open-sourced Sept 2025 |
| AWS Kiro | Spec-first IDE that generates requirements, design, and tasks before code | 2025 |
| Claude Code plan mode | A read-only planning pass that produces an approved plan before any edit | 2025 |
| OpenSpec | Open convention for structured, versioned specifications in the repo | 2025 |
| BMAD-METHOD | Agentic agile framework wrapping planning roles around the build loop | 2025 |
For the requirement statements themselves, EARS notation (Easy Approach to Requirements Syntax) gives a small, testable grammar — “When <trigger>, the system shall <response>” — that turns a vague wish into something an agent can build against and a reviewer can check.
ThoughtWorks' Technology Radar Vol. 33 (2025) places spec-driven development in the Assess ring — worth a look, not yet a default — and warns explicitly against heavy up-front specification and big-bang releases as an antipattern. The discipline earns its keep when specs stay small, living, and close to the code. Push it toward exhaustive up-front documents and you've reinvented waterfall. It is also a genuinely poor fit for exploratory work, where you're still discovering what to build and a firm spec would just be a confident guess.
Once the ticket is a structured spec, the backlog stops being a human-only list and becomes data an agent can operate on — issues in GitHub, Linear, or Jira reached over MCP, or plain markdown work items committed alongside the code. What that changes in practice is concrete: agents cluster duplicates, flag stale tickets, draft sizing, and produce the planning pre-read. The planning meeting stops being “let's build a plan while everyone watches” and starts from a complete draft — the question in the room becomes “do we agree?” instead of “let's write this from scratch.”
AI is good at clustering and drafting. It is bad at deciding business priority, and pretending otherwise is how teams automate themselves into shipping the wrong thing efficiently. The division of labour that holds: the human ranks the backlog (priority is a judgement about the business, not a text-prediction task), and the engineer who ships the work owns the estimate. The agent prepares; the people decide.
## [WORK-142] Add rate-limit to the /export endpoint outcome: /export refuses more than 5 requests/min per API key, returning HTTP 429 with a Retry-After header. scope: api/export handler + shared limiter. No UI changes. constraints: Limiter state in existing Redis. No new deps. decisions: Per-key, not per-IP (keys already authenticated). verify: - Unit: 6th call within 60s returns 429 + Retry-After. - Unit: counter resets after the window. - CI: existing /export tests still pass. context: See software-for-agents.html §grounding; limiter at api/lib/limiter.ts.
That item is small enough for one agent session, states its own acceptance criteria, points at the live code to ground against, and can be verified without a human reading the agent's reasoning — only its diff and its passing tests. A backlog full of items shaped like this is the prompt library the operator runs against.
The two-week sprint was calibrated to a human build loop. When spec-to-first-commit compresses to minutes, that unit stops being natural — you can cycle through several well-specified items in the time a planning meeting used to take. The habit is sticky, though: State of Agile data still shows roughly 59% of teams on two-week sprints, a share that has declined every year since 2022 (via Cadence, “How to plan a software development sprint in 2026”, June 2026). The mainstream alternatives are flow-based cycles (Linear-style continuous flow) and Shape Up appetites (fixed time, variable scope).
Story-point velocity is an estimate of how much a team can write in a cycle. When agents write most of the code, the constraint stops being how fast anyone types and shifts to spec quality and review capacity — and a metric calibrated on the old bottleneck goes haywire against the new one. What survives the shift is measurement of throughput and cycle-time on merged, verified work, and probabilistic forecasting off actual completion data, which beats ceremony-driven estimation precisely because it doesn't care who wrote the code, only what shipped and passed.
State the recalibration honestly, because someone will wave a chart: velocity numbers will move sharply, and stakeholder expectations need resetting. A bigger number is not a better forecast if review is the queue — code drafted but stuck behind a human reviewer isn't delivered, and counting it as velocity just moves the lie downstream. What to watch instead of raw velocity:
Share of agent tasks that pass review and CI on the first attempt. The truest read on spec quality.
How many finished-but-unreviewed PRs are waiting. The new bottleneck, made visible.
How often agent changes clear the pipeline unaided. Falling rate = specs or tests degrading.
Share of merged work later reverted or reopened. Catches the “shipped fast, wrong anyway” failure.
The throughline across this branch is the operator: one person commanding several agents in parallel, with human specialists in the loop for review and judgement. In that model, planning has a precise, humble job — keep the operator's queue full of well-specified, independently shippable work items, each small enough for one agent session and each gated by CI and PR review exactly as software/software-for-agents describes. Planning isn't running the build anymore; it's feeding it clean inputs.
Everything here rests on the machinery from software-for-agents: the repo as context, a CLAUDE.md that orients the agent, SKILL.md procedures it can load, and CI as the safety net that gates every change. A spec-driven backlog is only safe because each item lands as a branch, runs the full pipeline, and waits for a human to approve the diff. Take away the harness and “let agents work the backlog” becomes the reckless sentence it sounds like. With the harness, it's a controlled, auditable operation — the queue is full, and every item that clears it was reviewed and is reversible.
No leaf in this branch ships without the counter-case. Spec-driven, agent-operated planning is not free, and it is not always worth it.
The position exists and deserves a straight answer: if agents can produce a passing PR in minutes, isn't a human reviewer just a bottleneck slowing the machine down? No. The gate isn't there because agents can't type — it's there for risk, compliance, and accountability. Someone has to be answerable for what shipped, a regulator will ask who approved it, and “the model did” is not an answer a POPIA or FSCA review accepts. Review is not friction to be optimised away; it's where responsibility lives. The correct optimisation is making the diff easy to review — small, well-specified, well-tested — not removing the reviewer.
Same move as software-for-agents: this is leverage SA teams can afford. A spec-and-PR planning discipline gives you frontier-grade agent output on hygiene, not budget — you don't need a research team or a frontier bill, you need work items written well enough for an agent to execute and a pipeline that gates them. And the by-product is exactly what regulators ask for: the versioned spec plus the PR trail is the change-control audit artifact a POPIA or FSCA review wants — who changed what, why, who approved it, and how it was verified — produced automatically as you work, not assembled in a panic before an audit.
Also cited inline: “Spec-Driven Development: From Code to Contract in the Age of AI” (arXiv preprint, Feb 2026); Deloitte, State of AI 2026; Cadence, “How to plan a software development sprint in 2026” (June 2026); Josh Owens, “Your AI Doesn't Need Better Prompts. It Needs a Sprint Planning Session.” (Feb 2026). Linked here only where a stable primary URL resolves.