Loop Engineering

Chapter One

Origins & Definitions

Reading Osmani, Loop Engineering, intro through "the five pieces"; Thakker, Loop Engineering: 2026 Guide, "the tweet" through "cron job with a hat on."

1.1The tweet, the quote, the ladder

On June 8, 2026, Peter Steinberger — creator of OpenClaw, now at OpenAI — posted that you shouldn't be prompting coding agents anymore; you should be designing the loops that prompt them. Two sentences, 6.5 million views, and a timeline full of people asking, in Matthew Berman's words, "wtf is a loop?"

Boris Cherny, who created Claude Code, had given the clean definition on stage days earlier: he doesn't prompt Claude anymore — loops run continuously, prompt Claude, and figure out what to do, and his job is writing the loops. The receipt behind the rhetoric: in late 2025 he reported that all of his Claude Code contributions over a thirty-day stretch — 259 merged pull requests — were written by Claude Code itself, and that he had deleted his IDE in November and never reopened it.

Cherny describes the shift as a three-rung ladder. At the first rung you write code by hand with AI suggestions: a typist. At the second you run five or ten agent sessions and prompt each one: a prompt operator. At the third you write loops, and agents read GitHub, Slack, and trackers to decide what to build: a loop engineer. The nuance the "prompting is dead" crowd skips is that nobody claims engineers are obsolete. Someone still decides what to build and talks to users. The job moved from writing the code to writing the thing that writes the code.

1.2The lineage

"Loop" hides at least five different things, which is why the discourse talked past itself. The ladder, oldest to newest:

Era	Pattern	What it added
2022	ReAct	The academic while-loop: reason, call a tool, observe, repeat — a human watching throughout.
2023	AutoGPT	Goal-driven self-prompting, famous for spinning forever; it seeded years of skepticism.
2025	Ralph	Huntley's bash one-liner: `while :; do cat PROMPT.md \| claude; done`. The innovation was discipline, not orchestration — context resets to fixed anchor files each pass, progress lives on disk and in git, one unit of work per iteration. It built an entire programming language for about $297.
Spring 2026	Productized ralph	Claude Code and Codex both shipped `/goal`: run until a verifiable stop condition holds.
2026	Orchestration	A supervising loop dispatches and checks many agents in parallel — scheduled, worktree-isolated, git-backed. Yegge's open-source Gas Town coordinates twenty to thirty instances under a "Mayor" agent.

Figure 1. Five generations of agent loops. Single-agent ralph is the baseline; multi-agent orchestration is the new layer.

The best skeptic line of the cycle — cron jobs have funny re-branding right now — is half right. The scheduling layer is cron. What cron never had is the part in the middle: a fixed script runs the same branches every tick, while a loop runs a model that reads current state, decides the next action, verifies, retries, and dispatches other agents. A loop is cron plus a decision-maker, and the interesting engineering is everything wrapped around the decision so it doesn't run off a cliff.

1.3Anatomy: two nested loops

Two loops are nested here, and conflating them causes most of the confusion. The inner loop ships in the product: prompt, evaluate, tool calls, observe results, repeat — until the model produces a response with no tool calls, or a limit fires. Each round trip is one turn. You don't build this.

The outer loop is yours. On a trigger — a schedule, a PR comment, a CI failure — it prompts the agent; reads what the agent produced; decides whether the goal is met, ideally via something other than the agent itself; and if not done, prompts again or escalates to a human. Loop engineering is the design of that outer loop. Osmani places it one floor above harness engineering: the harness, but on a timer, spawning helpers, feeding itself.

Figure 2. Two nested loops. The inner cycle is the product's; the outer one — trigger, prompt, read, verify, report — is the loop you engineer.

Check your understanding

Take a recurring task in your own work. Is it a cron job or a loop — does it need a decision-maker in the middle, or just a schedule? If you can't say which, re-read §1.2.

Chapter Two

Specification

Reading Osmani, "the five pieces" through "what one loop looks like"; Thakker, "the loop contract."

2.1The loop contract

Before building anything, write the loop as six lines. If you can't fill all six, the loop isn't ready to run unattended.

TRIGGER  → every 15m, on PR comment, on CI failure
SCOPE    → open PRs authored by me, repo X only
ACTION   → run tests, fix lint, respond to review
BUDGET   → max 3 sub-agents per tick, 50k tokens
STOP     → all PRs green, or 10 iterations, or $5 spent
REPORT   → post summary to Slack #eng-bots

The six-line loop contract.

That is the difference between a task repeated and an engineered loop: the trigger is explicit, the blast radius is bounded, the spend is capped, the exit is defined, and a human hears about it without having to check.

2.2Six building blocks

A year ago a loop was a pile of bash you maintained alone. Now the pieces ship inside the products, and the same six primitives exist in both Claude Code and the Codex app — so you design the loop once and it works in either.

Primitive	Job in the loop	In Claude Code
Automations	The heartbeat: discovery and triage on a schedule	`/loop`, `/goal`, cron tasks, hooks, GitHub Actions
Worktrees	Isolation, so parallel agents don't collide on files	`git worktree`; `--worktree`; `isolation: worktree`
Skills	Project knowledge written down once, not guessed every run	`SKILL.md` folders, invoked by name or description match
Connectors	Reach into real tools: trackers, databases, Slack	MCP servers; plugins bundle connectors and skills
Sub-agents	Split the maker from the checker	`.claude/agents/`; agent teams
State	The spine: what's done and what's next, outside any conversation	Markdown progress files, `AGENTS.md`, or a board via MCP

Two of these deserve emphasis. Skills are where intent stops costing you over and over: without them the loop re-derives your whole project from zero every cycle; with them it compounds. Steinberger's rule — anything you do twice becomes a skill; anything hard becomes a skill afterward, so next time is free. A loop calling sharp, named, tested skills is a system that compounds; a loop with none is a while true around a stranger.

And state on disk sounds too dumb to matter, but it is the trick every long-running agent depends on: the model forgets everything between runs. The agent forgets; the repo doesn't.

Practice

Write a complete six-line contract for one recurring task in your work — code or otherwise. Each line should be specific enough that someone else could implement it without asking you anything.

Chapter Three

Primitives & Verification

Reading Thakker, "/loop" and "verification"; Anthropic, Prompting Claude Fable 5, verification and progress-claims sections.

3.1`/loop` and `/goal`

The two in-session primitives are easily confused. /loop re-runs on a cadence: give it an interval and a prompt, and it keeps running it. Omit the interval and Claude picks a dynamic delay — one minute to an hour — based on what it observed: short waits while a build finishes, long waits when nothing is pending. A loop.md in your project replaces the default maintenance prompt; Esc stops a waiting loop.

/goal runs until a condition is true. You state a verifiable end state — all tests in test/auth pass and lint is clean — and after every turn a separate model checks whether it's met. The agent that wrote the code is not the one grading it: the maker/checker split applied to the stop condition itself.

Cherny's canonical starter is worth reading twice. It asks Claude to maintain all pull requests indefinitely, dispatching isolated sub-agents as comments arrive — not to fix one thing:

/loop babysit all my PRs. Auto-fix build issues, and when comments come in, use a
worktree agent to fix them.

Cherny's canonical /loop. Variants: /loop 5m /babysit, /loop 30m /slack-feedback.

Note the pattern in the variants: the loop invokes skills by name, keeping the recurring thing maintainable instead of pasting a wall of instructions into a schedule nobody updates.

3.2Dynamic workflows: structure, not recurrence

A third primitive, announced in late May 2026 and easily conflated with loops, sits on a different axis entirely. Mention the word "workflow" in a prompt — or enable the ultracode setting — and Claude writes a JavaScript orchestration script for the task; a separate runtime then executes that script in the background, fanning out tens to hundreds of subagents, each with a clean context window and one focused job, and cross-checks the results before a single coordinated answer returns to your session.

The distinction that matters is who holds the plan. In a loop, control stays with the model, tick by tick. In a workflow, the plan becomes code — the iteration, branching, and fan-out run deterministically — which is why stage ordering holds even across hundreds of agents, and why the classic single-context failure (review fifty files, quietly stop at thirty-five, declare victory) disappears. The standard shape also bakes Chapter 3's golden rule in per task: an implementer, a layer of independent verifiers, then a fixer, before everything fans back in.

The official decision rule for choosing between primitives: if the plan fits in two or three steps Claude can hold in its head, use subagents and skills; once the plan is code-shaped and repeatable across hundreds of independent operations — repo-wide bug hunts with a verification pass, large migrations, a plan stress-tested by adversarial reviewers — reach for a workflow. They cost more tokens than an ordinary session, since every agent pays its own setup overhead, so scope a small test before scaling. And the two compose naturally: recurrence outside, structure inside — a morning loop whose tick launches a workflow.

Availability Claude Code v2.1.154+, paid plans; on by default for Max, Team, and Enterprise, enabled from /config on Pro. See Anthropic's workflows documentation.

3.3The golden rule: something says no

The sharpest reply in Steinberger's thread: designing the loop is half of it — the other half is putting something in the loop that can say no. A test, a type check, a real error. A loop with nothing pushing back is the agent agreeing with itself on repeat. An open loop, where the agent writes until it declares itself done, is a demo. A closed loop, where tests and lint run after each write and the results feed back, is what ships. A review loop, where a background reviewer feeds findings back while context is still fresh, is best for long sessions.

Figure 3. The maker/checker split. The same separation governs /goal's stop condition: a separate model decides whether the loop is done.

The structural version of the rule: the generator never grades its own work. A model evaluating its own output is far too generous. Anthropic's harness guidance found it much more tractable to tune a standalone skeptical evaluator than to make a generator self-critical, and Lance Martin's Fable 5 experiments showed verifier sub-agents in fresh, independent contexts consistently beating self-critique — a rubric-driven verifier loop improved a training pipeline roughly six times more than the prior model generation managed on the same task.

Where the environment can't grade automatically — writing, analysis, design — substitute a written rubric. Score outcomes, not effort. Make every criterion falsifiable. And if the loop revises its own rubric, require a logged reason, so you can audit the drift.

Before reporting progress, audit each claim against a tool result from this session.
Only report work you can point to evidence for; if something is not yet verified, say so
explicitly. Report outcomes faithfully: if tests fail, say so with the output; if a step
was skipped, say that; when something is done and verified, state it plainly without
hedging.

Anthropic's evidence-audit instruction, from the official Fable 5 guide.

Lab · about fifteen minutes

On a sandbox repository: /loop 10m Review PR #123. If CI is failing, fix it. If there are unresolved review comments, address them in a worktree and push. If everything is green and approved, stop. Watch two ticks; confirm the loop reads state before acting.

Chapter Four

Context, Anchors & Memory

Reading Huntley, Ralph; Anthropic, How the agent loop works, context window and compaction; the Fable 5 guide's memory section.

4.1Anchor files

Every loop tick starts an agent cold. Anchor files are intent written down on the outside, read fresh each iteration. VISION.md is the north star — what you're building, the constraints, what done looks like at the project level; it is Steinberger's anchor of choice. CLAUDE.md or AGENTS.md holds the operating rules per tick: the stack, the commands, the guardrails, the we-don't-do-it-this-way-because-of-that-one-incident knowledge. PROMPT.md or loop.md is the prompt the loop pipes in each iteration. And the tests and type checks are the thing that says no when the agent is wrong.

4.2Context over time

Within a long run, context only grows — until the harness auto-compacts, summarizing older history, lossily. Instructions buried in the opening prompt may not survive compaction; CLAUDE.md is re-injected on every request and does. So standing rules go in CLAUDE.md, and you tell the compactor what to preserve:

# Summary instructions

When summarizing this conversation, always preserve:
- The current task objective and acceptance criteria
- File paths that have been read or modified
- Test results and error messages
- Decisions made and the reasoning behind them

A summary-instructions section for CLAUDE.md, from the Agent SDK docs.

Sub-agents are the other context lever: each starts fresh and returns only its final summary to the parent, so the orchestrator's context grows by a paragraph, not a transcript. Scope each one's tools to the minimum it needs, and run routine ones at low effort.

4.3Memory: the loop that learns

Anchor files hold what is always true; memory holds what the loop learned. A directory of Markdown notes read at start and written at end is enough:

Store one lesson per file with a one-line summary at the top. Record corrections and
confirmed approaches alike, including why they mattered. Don't save what the repo or
chat history already records; update an existing note rather than creating a duplicate;
delete notes that turn out to be wrong.

Anthropic's memory rules, from the official Fable 5 guide.

This is what turns a task loop into a continual-learning loop: fail, investigate, verify the fix, distill the lesson into memory, and consult a human only when stuck. On continual-learning benchmarks, this memory-plus-verifier structure is where Fable 5's largest gains over prior models showed up. The loop gets measurably better at your recurring work across runs — which, more than any single trick, is the durable advantage the loop-engineering crowd is chasing.

Lab · about an hour

Build a ralph loop with guardrails. The prompt: read specs/TODO.md for the next unchecked item; implement exactly that item; run the tests; commit and mark done on green; if tests fail twice with the same error, write BLOCKED and exit; exit after one item either way. Wrap it in a bash loop capped at ten iterations that breaks on BLOCKED or an empty TODO. Run it only in an isolated worktree or container — the permissions-skipping flag is named the way it is for a reason.

Chapter Five

Guardrails & Cost

Reading Thakker, "guardrails"; Agent SDK docs on turns & budget, permission modes, and hooks.

5.1Three hard stops

Once the model writes code for almost nothing, cost moves to the loop running it. The cautionary receipt of June 2026: Uber reportedly capped engineers at $1,500 per person per tool per month after burning its annual AI budget in four months. Without guardrails you get infinite loops and billing surprises orders of magnitude over budget. Every serious write-up converges on three hard stops.

First, a maximum iteration count: /goal tracks turns natively, the SDK has max_turns, and a bare ralph loop has no ceiling unless you add one — every loop gets one. Second, no-progress detection: stop when the same error, an empty diff, or the same failing test appears several times in a row. Huntley tunes ralph prompts "like a guitar" off these failure patterns; prompt iteration is part of loop engineering. Third, a dollar ceiling: max_budget_usd in the SDK, or a budget line in the contract — set before you sleep, not after the invoice.

5.2The control surface

Knob	What it does
`max_turns`, `max_budget_usd`	Hard caps; the loop exits with resumable `error_max_turns` or `error_max_budget_usd` result subtypes.
`effort`	Reasoning depth per turn, settable per sub-agent: explorers low, verifiers high.
`permission_mode`, `allowed_tools`	What runs without asking. Scope rules like `Bash(npm *)` narrow further; reserve `bypassPermissions` for sandboxes.
Hooks	`PreToolUse` blocks dangerous calls, `Stop` validates results, `PreCompact` archives transcripts. They run in your process, cost no context, and enforce where prompts merely steer.

A Fable 5 specific: check stop_reason on exit. A value of refusal means a safety classifier fired, and the right move is falling back to Opus 4.8 — not retrying blindly.

Lab · a programmatic loop

Build the SDK version, with both caps in place:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage

GOAL = """Goal: all tests in the repo pass and CHANGELOG.md describes each fix.
Done means: `npm test` exits 0, verified by running it, and the changelog entry
exists. Check memory/ for lessons from prior runs before starting; write new
lessons there when finished."""

async def overnight():
    async for msg in query(
        prompt=GOAL,
        options=ClaudeAgentOptions(
            allowed_tools=["Read", "Edit", "Write", "Bash", "Glob", "Grep"],
            setting_sources=["project"],   # loads CLAUDE.md w/ standing rules
            max_turns=80,                  # hard stop 1
            max_budget_usd=15.0,           # hard stop 2
            effort="high",
        ),
    ):
        if isinstance(msg, ResultMessage):
            print(msg.subtype, msg.session_id)
            if msg.subtype == "success":
                print(msg.result)
            elif msg.subtype == "error_max_turns":
                print("On track but capped — resume session to continue.")

asyncio.run(overnight())

An overnight loop with both hard stops, via the Agent SDK.

After it runs, note what it spent, what each hard stop would have caught, and what you would cap differently next time.

Chapter Six

Risk & Loops Beyond Code

Reading Osmani, "what the loop still does not do for you" through the close.

6.1What the loop doesn't do for you

Osmani's closing argument is the most honest part of the corpus: the loop changes the work, it doesn't delete you from it — and three problems get sharper as the loop gets better, not easier.

Verification is still on you. A loop running unattended is also a loop making mistakes unattended; the verifier split makes "it's done" mean something, but even then done is a claim, not a proof, and your job is to ship work you confirmed. Comprehension debt grows with throughput: the faster the loop ships code you didn't write, the wider the gap between what exists and what you understand — unless you read what the loop made. And cognitive surrender is the comfortable posture: when the loop runs itself, it's tempting to stop having an opinion and take whatever comes back. Designing loops with judgment is the cure; designing them to avoid thinking is the accelerant. Same action, opposite results.

One ceiling no tool removes is what Osmani calls the orchestration tax: worktrees eliminate file collisions between parallel agents, but your review bandwidth — not the tooling — decides how many loops you can responsibly run. Two people can build the identical loop and get opposite outcomes: one moves faster on work they understand deeply; the other avoids understanding the work at all. The loop doesn't know the difference. You do.

6.2Loops beyond code

The discourse is coding-centric because that's where feedback is cheapest — tests say no for free. But the loop contract transfers to any work with checkable output. A research loop drafts, then a verifier fetches every cited source and confirms it exists and supports the claim, fixing or flagging until the citation rubric passes. A data-pipeline loop ingests, validates against a schema, produces the report, and has a fresh-context checker independently re-derive the headline numbers and diff them. A document-QA loop generates, scores against a written rubric, and revises, so the human sees only versions that passed. A monitoring loop checks sources on a schedule, diffs against memory of the last run, and pings a human only when something material changed.

The common thread: the human moves from operator to editor — auditing outcomes instead of steering iterations. That role shift, not any single trick, is what loop engineering names.

6.3Capstone pattern: full orchestration

One loop shape Osmani keeps reusing, assembled from all six blocks. A scheduled automation calls a triage skill that reads yesterday's CI failures, open issues, and recent commits, and writes findings to a state file. For each finding worth doing, the loop opens an isolated worktree and dispatches a builder sub-agent; a second sub-agent reviews the draft against the project's skills and tests. Connectors open the pull request and update the ticket; anything the loop can't handle lands in a triage inbox for a human. The state file is the spine — tomorrow's run picks up where today stopped. Designed once; never prompted step-by-step again.

Figure 4. The capstone orchestration loop, assembled from all six building blocks — one isolated worktree per finding. Designed once; never prompted step-by-step again.

Practice · put it together

Build a loop for the contract you wrote in Chapter Two, using the capstone pattern as a template. When it runs, audit yourself against §6.1: which of the three risks is your design most exposed to, and what specifically guards against it?

Loop EngineeringJune 2026

Origins & Definitions

1.1The tweet, the quote, the ladder

1.2The lineage

1.3Anatomy: two nested loops

Specification

2.1The loop contract

2.2Six building blocks

Primitives & Verification

3.1`/loop` and `/goal`

3.2Dynamic workflows: structure, not recurrence

3.3The golden rule: something says no

Context, Anchors & Memory

4.1Anchor files

4.2Context over time

4.3Memory: the loop that learns

Guardrails & Cost

5.1Three hard stops

5.2The control surface

Risk & Loops Beyond Code

6.1What the loop doesn't do for you

6.2Loops beyond code

6.3Capstone pattern: full orchestration

Glossary

Origins & Definitions

1.1The tweet, the quote, the ladder

1.2The lineage

1.3Anatomy: two nested loops

Specification

2.1The loop contract

2.2Six building blocks

Primitives & Verification

3.1/loop and /goal

3.2Dynamic workflows: structure, not recurrence

3.3The golden rule: something says no

Context, Anchors & Memory

4.1Anchor files

4.2Context over time

4.3Memory: the loop that learns

Guardrails & Cost

5.1Three hard stops

5.2The control surface

Risk & Loops Beyond Code

6.1What the loop doesn't do for you

6.2Loops beyond code

6.3Capstone pattern: full orchestration

Glossary

3.1`/loop` and `/goal`