know.2nth.ai Agents Building Effective Agents
agents · Building Effective Agents · Skill Leaf

Most “agents” should be workflows.

Anthropic’s Building Effective Agents (Schluntz & Zhang, Dec 2024) is the most-cited practical guide to composing agentic systems. Its core move is a distinction most teams skip: workflows orchestrate LLMs through predefined code paths; agents let the LLM direct its own process. The advice that follows is consistently “use the simplest thing that works” — and most of the time, that’s a workflow. This leaf maps the taxonomy to the rest of the tree.

Anthropic origin Workflows vs agents Augmented LLM 5 patterns Dec 2024

Workflows and agents are not the same thing.

Anthropic groups everything people loosely call “agentic” into two categories, and keeping them apart is the whole point of the article — they have different cost, predictability, and failure profiles.

 WorkflowAgent
Who decides the stepsYou do, in code — predefined pathsThe model does, dynamically, at run time
Control flowFixed; LLMs & tools orchestrated through codeOpen-ended; the LLM directs its own tool use
PredictabilityHigh — same shape every runLower — the path varies with the task
Cost / latencyBounded and knownVariable; can be many model calls
Best forWell-defined tasks you can decomposeOpen problems where steps can’t be predicted

The guiding rule the article returns to again and again: find the simplest solution possible, and only increase complexity when it demonstrably improves outcomes. Agentic systems trade latency and cost for better performance on hard tasks — a trade worth making when the task warrants it, and a waste when a single well-crafted prompt or a fixed workflow would do. “Build an agent” is rarely the right first instinct.

On frameworks

Frameworks (LangGraph, Bedrock Agents, Rivet, Vellum, and others) make it easy to start, but they add abstraction layers that hide the underlying prompts and responses and make debugging harder. Anthropic’s recommendation: start by using the model APIs directly — many patterns need only a few lines of code — and if you do adopt a framework, make sure you understand what it does underneath. See agents/langgraph for where a framework genuinely earns its place.

The augmented LLM.

Every workflow and every agent is built from one unit: a model enhanced with three augmentations. Get this interface right before composing anything on top of it.

Augmentation 01

Retrieval

The model can pull in relevant external information on demand — documents, records, search results — instead of relying only on what’s in its weights or the prompt.

Augmentation 02

Tools

The model can call out to take actions and read live systems — function-calling, and increasingly MCP servers as the standard interface to them.

Augmentation 03

Memory

The model can persist and re-read state across steps and sessions — in the tree, that’s files under version control, not a black box.

Anthropic’s guidance on the building block is to tailor these augmentations to your specific use case and to give the model an easy, well-documented interface to them. That interface point is so important it returns as one of the three closing principles. MCP is the emerging standard way to wire the tools and retrieval into any model — covered in full at agents/mcp — and the memory augmentation is exactly the repo-as-memory discipline from software/software-for-agents.

Composable building blocks, simplest first.

Five workflow patterns cover the great majority of useful agentic systems. They’re ordered roughly by increasing flexibility — and increasing cost. Reach for the earliest one that solves your problem.

Pattern 01

Prompt chaining

Decompose a task into a fixed sequence of steps, each LLM call working on the previous call’s output. You can add programmatic “gates” between steps to check progress and bail early if something’s off.

Use when the task cleanly splits into fixed subtasks, and you’ll trade a little latency for higher accuracy per step. Example: write an outline, check it meets criteria, then write the full document from it.

Pattern 02

Routing

Classify an input, then direct it to a specialised follow-up — a different prompt, tool, or even a different model. Separation of concerns: each path is optimised for its category.

Use when there are distinct categories better handled separately and classification is accurate. Example: route support queries by type; send easy questions to a cheaper model and hard ones to a stronger one.

Pattern 03

Parallelization

Run multiple LLM calls at once and aggregate the results. Two variants: sectioning (split into independent subtasks run in parallel) and voting (run the same task several times for diverse takes, then combine).

Use when subtasks parallelize for speed, or multiple attempts raise confidence. Example (sectioning): one model answers while another screens the same input for policy violations. Example (voting): several passes review code for vulnerabilities and you act on any hit.

Pattern 04

Orchestrator-workers

A central LLM dynamically breaks a task into subtasks, delegates them to worker LLMs, and synthesises their results. Unlike parallelization, the subtasks aren’t fixed in advance — the orchestrator decides them based on the input.

Use when you can’t predict the subtasks ahead of time. Example: a coding change that touches an unknown set of files; a research task gathering from multiple sources the orchestrator picks on the fly.

Pattern 05

Evaluator-optimizer

One LLM generates a response; a second evaluates it and returns feedback; the first revises — in a loop — until the evaluation passes. A built-in critic.

Use when you have clear evaluation criteria and iterative refinement measurably helps. Example: literary translation where a critic catches nuance; multi-round research that’s refined against an explicit rubric.

The pair people confuse: parallelization vs orchestrator-workers

Both fan work out to multiple LLM calls. The difference is who picks the subtasks. In parallelization the subtasks are predefined by you in code — the structure is fixed. In orchestrator-workers the subtasks are decided at run time by the orchestrator model. If you know the breakdown in advance, parallelize; if the breakdown depends on the input, orchestrate.

When the path can’t be drawn in advance.

An agent, in the article’s strict sense, starts from a command or a conversation with a human, then plans and operates independently — choosing its own tools and steps, gaining ground truth from the environment at each step (a tool result, a test run, code execution), and looping until it judges the task done. It can pause for human feedback at checkpoints or when it hits a blocker. Crucially, it should include stopping conditions — a maximum number of iterations, say — so it stays under control.

Agents suit open-ended problems where you genuinely can’t predict the number or shape of steps. The cost is real: more model calls, higher latency, and compounding errors if a wrong early step poisons the rest. That’s why the autonomy has to sit inside a harness — sandboxing, gated changes, tests, reversibility — which is precisely the repo-as-safety-net argument the Software branch makes.

Agents in practice

The two worked examples in the article are a customer-support agent (a chatbot given tools to look up orders, issue refunds, and act on them, with clear success/stop conditions) and coding agents that resolve SWE-bench issues end-to-end — reading a repo, editing across files, running tests, and iterating until green. Both work because the environment gives honest feedback at every step and the changes are verifiable. agents/coding-cli-agents is this category, productised.

What “effective” actually means.

Whatever you build, the article closes on three principles — and a warning not to over-engineer.

Principle 01

Simplicity

Keep the design as simple as the task allows. Add a pattern, a tool, or autonomy only when it earns its place by improving outcomes you can measure.

Principle 02

Transparency

Show the agent’s planning steps explicitly. If you can see what it intends to do, you can catch a bad plan before it executes — and debug when it goes wrong.

Principle 03

A crafted ACI

The agent-computer interface — your tools’ names, docs, and formats — deserves the same care as a human UI. Thorough tool documentation and testing is most of the battle.

The agent-computer interface (ACI) is the sleeper of the three. The article spends an appendix on it: put yourself in the model’s shoes, give it enough tokens to think, keep tool formats close to text the model has seen naturally, name and document tools well, and test them exhaustively — even “poka-yoke” them so a tool is hard to misuse. In the tree, that’s the discipline behind agents/skills and agents/mcp: a capability is only as good as the interface the model meets it through.

When NOT to build an agent

If a single prompt with good examples solves it — do that. If a fixed workflow solves it — do that, and enjoy the predictability and bounded cost. Reserve full agent autonomy for open-ended problems where the value of flexibility outweighs the cost and unpredictability, and where you have the harness (tests, sandboxing, review, reversibility) to contain it. The goal is the task done reliably, not the most sophisticated architecture.

Most SA automations are workflows, not agents.

Match the pattern to the budget

For SA teams costing AI work in rands against USD-billed tokens, the workflow-first discipline is also a cost discipline. A SARS-submission helper, a POPIA-assessment drafter, an invoice-to-ledger step — these are prompt chains and routers, not autonomous agents, and they should be priced and built that way: bounded calls, predictable cost, auditable paths. Save agent autonomy — with its variable token spend and compounding-error risk — for the genuinely open-ended jobs that justify it, and wrap those in the repo harness so every change is reviewable and reversible. Choosing the simplest pattern that works is the difference between an AI line item you can forecast and one you can’t.

Where this links in the tree.

Primary source first.