Learn/Loop Engineering
Governance

Loop EngineeringJune 2026

Designing autonomous AI workflows: from prompting agents to authoring the systems that prompt them.

In June 2026, two viral statements — Peter Steinberger's "you should be designing loops that prompt your agents" and Boris Cherny's "my job is to write loops" — marked a shift in how practitioners work with AI agents. This guide treats the shift as an engineering discipline rather than a hot take. It draws on Anthropic's Agent SDK documentation and Fable 5 guide, and on practitioner writing by Addy Osmani, Yash Thakker, and Geoffrey Huntley. Labs reference Claude Code; the patterns transfer to any platform with scheduling, sub-agents, and skills.

In Plain Terms

Until recently, getting useful work out of an AI meant having a conversation with it. You asked for something, read what came back, asked for the next thing. The AI did the typing, but you were the engine — nothing happened unless you were sitting there pushing it forward, one request at a time.

A loop replaces you as that engine. It's a standing arrangement, written down once: here is the job, here is how to tell whether it was done correctly, here is how much time and money it's allowed to spend, and here is when to stop and ask a human. A small program then hands that job to the AI on a schedule — every morning, every fifteen minutes, whenever something changes — checks the result, and either keeps going or taps you on the shoulder.

The closest everyday analogy is delegating to a capable new employee. You wouldn't just say "handle it" and walk away. You'd write down the task and what good looks like, have someone other than them review the work, give them a budget, and tell them when to escalate. Every chapter of this guide is one of those instincts, made mechanical: the written instructions (Chapter 2), the independent reviewer (Chapter 3), the budget and the stopping rules (Chapter 5), the judgment about what you should never stop checking yourself (Chapter 6).

Why this is suddenly everywhere: the newest models can finally work unsupervised for hours without losing the thread, so an arrangement that used to fall apart after thirty minutes now holds for a working day. "Loop engineering" is the name that stuck for designing those arrangements well — and your role shifts with it, from the person doing every step to the person who wrote the system and audits what it produces.

Chapter One

Origins & Definitions


Reading Osmani, Loop Engineering, intro through "the five pieces"; Thakker, Loop Engineering: 2026 Guide, "the tweet" through "cron job with a hat on."

1.1The tweet, the quote, the ladder

On June 8, 2026, Peter Steinberger — creator of OpenClaw, now at OpenAI — posted that you shouldn't be prompting coding agents anymore; you should be designing the loops that prompt them. Two sentences, 6.5 million views, and a timeline full of people asking, in Matthew Berman's words, "wtf is a loop?"

Boris Cherny, who created Claude Code, had given the clean definition on stage days earlier: he doesn't prompt Claude anymore — loops run continuously, prompt Claude, and figure out what to do, and his job is writing the loops. The receipt behind the rhetoric: in late 2025 he reported that all of his Claude Code contributions over a thirty-day stretch — 259 merged pull requests — were written by Claude Code itself, and that he had deleted his IDE in November and never reopened it.

Cherny describes the shift as a three-rung ladder. At the first rung you write code by hand with AI suggestions: a typist. At the second you run five or ten agent sessions and prompt each one: a prompt operator. At the third you write loops, and agents read GitHub, Slack, and trackers to decide what to build: a loop engineer. The nuance the "prompting is dead" crowd skips is that nobody claims engineers are obsolete. Someone still decides what to build and talks to users. The job moved from writing the code to writing the thing that writes the code.

1.2The lineage

"Loop" hides at least five different things, which is why the discourse talked past itself. The ladder, oldest to newest:

EraPatternWhat it added
2022ReActThe academic while-loop: reason, call a tool, observe, repeat — a human watching throughout.
2023AutoGPTGoal-driven self-prompting, famous for spinning forever; it seeded years of skepticism.
2025RalphHuntley's bash one-liner: while :; do cat PROMPT.md | claude; done. The innovation was discipline, not orchestration — context resets to fixed anchor files each pass, progress lives on disk and in git, one unit of work per iteration. It built an entire programming language for about $297.
Spring 2026Productized ralphClaude Code and Codex both shipped /goal: run until a verifiable stop condition holds.
2026OrchestrationA supervising loop dispatches and checks many agents in parallel — scheduled, worktree-isolated, git-backed. Yegge's open-source Gas Town coordinates twenty to thirty instances under a "Mayor" agent.

The best skeptic line of the cycle — cron jobs have funny re-branding right now — is half right. The scheduling layer is cron. What cron never had is the part in the middle: a fixed script runs the same branches every tick, while a loop runs a model that reads current state, decides the next action, verifies, retries, and dispatches other agents. A loop is cron plus a decision-maker, and the interesting engineering is everything wrapped around the decision so it doesn't run off a cliff.

1.3Anatomy: two nested loops

Two loops are nested here, and conflating them causes most of the confusion. The inner loop ships in the product: prompt, evaluate, tool calls, observe results, repeat — until the model produces a response with no tool calls, or a limit fires. Each round trip is one turn. You don't build this.

The outer loop is yours. On a trigger — a schedule, a PR comment, a CI failure — it prompts the agent; reads what the agent produced; decides whether the goal is met, ideally via something other than the agent itself; and if not done, prompts again or escalates to a human. Loop engineering is the design of that outer loop. Osmani places it one floor above harness engineering: the harness, but on a timer, spawning helpers, feeding itself.

Check your understanding

Take a recurring task in your own work. Is it a cron job or a loop — does it need a decision-maker in the middle, or just a schedule? If you can't say which, re-read §1.2.

Chapter Two

Specification


Reading Osmani, "the five pieces" through "what one loop looks like"; Thakker, "the loop contract."

2.1The loop contract

Before building anything, write the loop as six lines. If you can't fill all six, the loop isn't ready to run unattended.

TRIGGER  → every 15m, on PR comment, on CI failure
SCOPE    → open PRs authored by me, repo X only
ACTION   → run tests, fix lint, respond to review
BUDGET   → max 3 sub-agents per tick, 50k tokens
STOP     → all PRs green, or 10 iterations, or $5 spent
REPORT   → post summary to Slack #eng-bots
The six-line loop contract.

That is the difference between a task repeated and an engineered loop: the trigger is explicit, the blast radius is bounded, the spend is capped, the exit is defined, and a human hears about it without having to check.

2.2Six building blocks

A year ago a loop was a pile of bash you maintained alone. Now the pieces ship inside the products, and the same six primitives exist in both Claude Code and the Codex app — so you design the loop once and it works in either.

PrimitiveJob in the loopIn Claude Code
AutomationsThe heartbeat: discovery and triage on a schedule/loop, /goal, cron tasks, hooks, GitHub Actions
WorktreesIsolation, so parallel agents don't collide on filesgit worktree; --worktree; isolation: worktree
SkillsProject knowledge written down once, not guessed every runSKILL.md folders, invoked by name or description match
ConnectorsReach into real tools: trackers, databases, SlackMCP servers; plugins bundle connectors and skills
Sub-agentsSplit the maker from the checker.claude/agents/; agent teams
StateThe spine: what's done and what's next, outside any conversationMarkdown progress files, AGENTS.md, or a board via MCP

Two of these deserve emphasis. Skills are where intent stops costing you over and over: without them the loop re-derives your whole project from zero every cycle; with them it compounds. Steinberger's rule — anything you do twice becomes a skill; anything hard becomes a skill afterward, so next time is free. A loop calling sharp, named, tested skills is a system that compounds; a loop with none is a while true around a stranger.

And state on disk sounds too dumb to matter, but it is the trick every long-running agent depends on: the model forgets everything between runs. The agent forgets; the repo doesn't.

Practice

Write a complete six-line contract for one recurring task in your work — code or otherwise. Each line should be specific enough that someone else could implement it without asking you anything.

Chapter Three

Primitives & Verification


Reading Thakker, "/loop" and "verification"; Anthropic, Prompting Claude Fable 5, verification and progress-claims sections.

3.1/loop and /goal

The two in-session primitives are easily confused. /loop re-runs on a cadence: give it an interval and a prompt, and it keeps running it. Omit the interval and Claude picks a dynamic delay — one minute to an hour — based on what it observed: short waits while a build finishes, long waits when nothing is pending. A loop.md in your project replaces the default maintenance prompt; Esc stops a waiting loop.

/goal runs until a condition is true. You state a verifiable end state — all tests in test/auth pass and lint is clean — and after every turn a separate model checks whether it's met. The agent that wrote the code is not the one grading it: the maker/checker split applied to the stop condition itself.

Cherny's canonical starter is worth reading twice. It asks Claude to maintain all pull requests indefinitely, dispatching isolated sub-agents as comments arrive — not to fix one thing:

/loop babysit all my PRs. Auto-fix build issues, and when comments come in, use a
worktree agent to fix them.
Cherny's canonical /loop. Variants: /loop 5m /babysit, /loop 30m /slack-feedback.

Note the pattern in the variants: the loop invokes skills by name, keeping the recurring thing maintainable instead of pasting a wall of instructions into a schedule nobody updates.

3.2Dynamic workflows: structure, not recurrence

A third primitive, announced in late May 2026 and easily conflated with loops, sits on a different axis entirely. Mention the word "workflow" in a prompt — or enable the ultracode setting — and Claude writes a JavaScript orchestration script for the task; a separate runtime then executes that script in the background, fanning out tens to hundreds of subagents, each with a clean context window and one focused job, and cross-checks the results before a single coordinated answer returns to your session.

The distinction that matters is who holds the plan. In a loop, control stays with the model, tick by tick. In a workflow, the plan becomes code — the iteration, branching, and fan-out run deterministically — which is why stage ordering holds even across hundreds of agents, and why the classic single-context failure (review fifty files, quietly stop at thirty-five, declare victory) disappears. The standard shape also bakes Chapter 3's golden rule in per task: an implementer, a layer of independent verifiers, then a fixer, before everything fans back in.

The official decision rule for choosing between primitives: if the plan fits in two or three steps Claude can hold in its head, use subagents and skills; once the plan is code-shaped and repeatable across hundreds of independent operations — repo-wide bug hunts with a verification pass, large migrations, a plan stress-tested by adversarial reviewers — reach for a workflow. They cost more tokens than an ordinary session, since every agent pays its own setup overhead, so scope a small test before scaling. And the two compose naturally: recurrence outside, structure inside — a morning loop whose tick launches a workflow.

Availability Claude Code v2.1.154+, paid plans; on by default for Max, Team, and Enterprise, enabled from /config on Pro. See Anthropic's workflows documentation.

3.3The golden rule: something says no

The sharpest reply in Steinberger's thread: designing the loop is half of it — the other half is putting something in the loop that can say no. A test, a type check, a real error. A loop with nothing pushing back is the agent agreeing with itself on repeat. An open loop, where the agent writes until it declares itself done, is a demo. A closed loop, where tests and lint run after each write and the results feed back, is what ships. A review loop, where a background reviewer feeds findings back while context is still fresh, is best for long sessions.

The structural version of the rule: the generator never grades its own work. A model evaluating its own output is far too generous. Anthropic's harness guidance found it much more tractable to tune a standalone skeptical evaluator than to make a generator self-critical, and Lance Martin's Fable 5 experiments showed verifier sub-agents in fresh, independent contexts consistently beating self-critique — a rubric-driven verifier loop improved a training pipeline roughly six times more than the prior model generation managed on the same task.

Where the environment can't grade automatically — writing, analysis, design — substitute a written rubric. Score outcomes, not effort. Make every criterion falsifiable. And if the loop revises its own rubric, require a logged reason, so you can audit the drift.

Before reporting progress, audit each claim against a tool result from this session.
Only report work you can point to evidence for; if something is not yet verified, say so
explicitly. Report outcomes faithfully: if tests fail, say so with the output; if a step
was skipped, say that; when something is done and verified, state it plainly without
hedging.
Anthropic's evidence-audit instruction, from the official Fable 5 guide.
Lab · about fifteen minutes

On a sandbox repository: /loop 10m Review PR #123. If CI is failing, fix it. If there are unresolved review comments, address them in a worktree and push. If everything is green and approved, stop. Watch two ticks; confirm the loop reads state before acting.

Chapter Four

Context, Anchors & Memory


Reading Huntley, Ralph; Anthropic, How the agent loop works, context window and compaction; the Fable 5 guide's memory section.

4.1Anchor files

Every loop tick starts an agent cold. Anchor files are intent written down on the outside, read fresh each iteration. VISION.md is the north star — what you're building, the constraints, what done looks like at the project level; it is Steinberger's anchor of choice. CLAUDE.md or AGENTS.md holds the operating rules per tick: the stack, the commands, the guardrails, the we-don't-do-it-this-way-because-of-that-one-incident knowledge. PROMPT.md or loop.md is the prompt the loop pipes in each iteration. And the tests and type checks are the thing that says no when the agent is wrong.

4.2Context over time

Within a long run, context only grows — until the harness auto-compacts, summarizing older history, lossily. Instructions buried in the opening prompt may not survive compaction; CLAUDE.md is re-injected on every request and does. So standing rules go in CLAUDE.md, and you tell the compactor what to preserve:

# Summary instructions

When summarizing this conversation, always preserve:
- The current task objective and acceptance criteria
- File paths that have been read or modified
- Test results and error messages
- Decisions made and the reasoning behind them
A summary-instructions section for CLAUDE.md, from the Agent SDK docs.

Sub-agents are the other context lever: each starts fresh and returns only its final summary to the parent, so the orchestrator's context grows by a paragraph, not a transcript. Scope each one's tools to the minimum it needs, and run routine ones at low effort.

4.3Memory: the loop that learns

Anchor files hold what is always true; memory holds what the loop learned. A directory of Markdown notes read at start and written at end is enough:

Store one lesson per file with a one-line summary at the top. Record corrections and
confirmed approaches alike, including why they mattered. Don't save what the repo or
chat history already records; update an existing note rather than creating a duplicate;
delete notes that turn out to be wrong.
Anthropic's memory rules, from the official Fable 5 guide.

This is what turns a task loop into a continual-learning loop: fail, investigate, verify the fix, distill the lesson into memory, and consult a human only when stuck. On continual-learning benchmarks, this memory-plus-verifier structure is where Fable 5's largest gains over prior models showed up. The loop gets measurably better at your recurring work across runs — which, more than any single trick, is the durable advantage the loop-engineering crowd is chasing.

Lab · about an hour

Build a ralph loop with guardrails. The prompt: read specs/TODO.md for the next unchecked item; implement exactly that item; run the tests; commit and mark done on green; if tests fail twice with the same error, write BLOCKED and exit; exit after one item either way. Wrap it in a bash loop capped at ten iterations that breaks on BLOCKED or an empty TODO. Run it only in an isolated worktree or container — the permissions-skipping flag is named the way it is for a reason.

Chapter Five

Guardrails & Cost


Reading Thakker, "guardrails"; Agent SDK docs on turns & budget, permission modes, and hooks.

5.1Three hard stops

Once the model writes code for almost nothing, cost moves to the loop running it. The cautionary receipt of June 2026: Uber reportedly capped engineers at $1,500 per person per tool per month after burning its annual AI budget in four months. Without guardrails you get infinite loops and billing surprises orders of magnitude over budget. Every serious write-up converges on three hard stops.

First, a maximum iteration count: /goal tracks turns natively, the SDK has max_turns, and a bare ralph loop has no ceiling unless you add one — every loop gets one. Second, no-progress detection: stop when the same error, an empty diff, or the same failing test appears several times in a row. Huntley tunes ralph prompts "like a guitar" off these failure patterns; prompt iteration is part of loop engineering. Third, a dollar ceiling: max_budget_usd in the SDK, or a budget line in the contract — set before you sleep, not after the invoice.

5.2The control surface

KnobWhat it does
max_turns, max_budget_usdHard caps; the loop exits with resumable error_max_turns or error_max_budget_usd result subtypes.
effortReasoning depth per turn, settable per sub-agent: explorers low, verifiers high.
permission_mode, allowed_toolsWhat runs without asking. Scope rules like Bash(npm *) narrow further; reserve bypassPermissions for sandboxes.
HooksPreToolUse blocks dangerous calls, Stop validates results, PreCompact archives transcripts. They run in your process, cost no context, and enforce where prompts merely steer.

A Fable 5 specific: check stop_reason on exit. A value of refusal means a safety classifier fired, and the right move is falling back to Opus 4.8 — not retrying blindly.

Lab · a programmatic loop

Build the SDK version, with both caps in place:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage

GOAL = """Goal: all tests in the repo pass and CHANGELOG.md describes each fix.
Done means: `npm test` exits 0, verified by running it, and the changelog entry
exists. Check memory/ for lessons from prior runs before starting; write new
lessons there when finished."""

async def overnight():
    async for msg in query(
        prompt=GOAL,
        options=ClaudeAgentOptions(
            allowed_tools=["Read", "Edit", "Write", "Bash", "Glob", "Grep"],
            setting_sources=["project"],   # loads CLAUDE.md w/ standing rules
            max_turns=80,                  # hard stop 1
            max_budget_usd=15.0,           # hard stop 2
            effort="high",
        ),
    ):
        if isinstance(msg, ResultMessage):
            print(msg.subtype, msg.session_id)
            if msg.subtype == "success":
                print(msg.result)
            elif msg.subtype == "error_max_turns":
                print("On track but capped — resume session to continue.")

asyncio.run(overnight())
An overnight loop with both hard stops, via the Agent SDK.

After it runs, note what it spent, what each hard stop would have caught, and what you would cap differently next time.

Chapter Six

Risk & Loops Beyond Code


Reading Osmani, "what the loop still does not do for you" through the close.

6.1What the loop doesn't do for you

Osmani's closing argument is the most honest part of the corpus: the loop changes the work, it doesn't delete you from it — and three problems get sharper as the loop gets better, not easier.

Verification is still on you. A loop running unattended is also a loop making mistakes unattended; the verifier split makes "it's done" mean something, but even then done is a claim, not a proof, and your job is to ship work you confirmed. Comprehension debt grows with throughput: the faster the loop ships code you didn't write, the wider the gap between what exists and what you understand — unless you read what the loop made. And cognitive surrender is the comfortable posture: when the loop runs itself, it's tempting to stop having an opinion and take whatever comes back. Designing loops with judgment is the cure; designing them to avoid thinking is the accelerant. Same action, opposite results.

One ceiling no tool removes is what Osmani calls the orchestration tax: worktrees eliminate file collisions between parallel agents, but your review bandwidth — not the tooling — decides how many loops you can responsibly run. Two people can build the identical loop and get opposite outcomes: one moves faster on work they understand deeply; the other avoids understanding the work at all. The loop doesn't know the difference. You do.

6.2Loops beyond code

The discourse is coding-centric because that's where feedback is cheapest — tests say no for free. But the loop contract transfers to any work with checkable output. A research loop drafts, then a verifier fetches every cited source and confirms it exists and supports the claim, fixing or flagging until the citation rubric passes. A data-pipeline loop ingests, validates against a schema, produces the report, and has a fresh-context checker independently re-derive the headline numbers and diff them. A document-QA loop generates, scores against a written rubric, and revises, so the human sees only versions that passed. A monitoring loop checks sources on a schedule, diffs against memory of the last run, and pings a human only when something material changed.

The common thread: the human moves from operator to editor — auditing outcomes instead of steering iterations. That role shift, not any single trick, is what loop engineering names.

6.3Capstone pattern: full orchestration

One loop shape Osmani keeps reusing, assembled from all six blocks. A scheduled automation calls a triage skill that reads yesterday's CI failures, open issues, and recent commits, and writes findings to a state file. For each finding worth doing, the loop opens an isolated worktree and dispatches a builder sub-agent; a second sub-agent reviews the draft against the project's skills and tests. Connectors open the pull request and update the ticket; anything the loop can't handle lands in a triage inbox for a human. The state file is the spine — tomorrow's run picks up where today stopped. Designed once; never prompted step-by-step again.

Practice · put it together

Build a loop for the contract you wrote in Chapter Two, using the capstone pattern as a template. When it runs, audit yourself against §6.1: which of the three risks is your design most exposed to, and what specifically guards against it?

Reference

Glossary


Agent
An AI model connected to tools — file access, a terminal, the web — so it can act, not just answer.
Harness
The environment a single agent runs inside: its tools, permissions, context, and limits. A loop is a harness on a timer.
Turn
One round trip inside the agent's inner loop: the model requests tool calls, they run, and the results come back. Turns repeat until the model answers without requesting tools.
Sub-agent
A helper agent spawned with its own fresh context. Used to parallelize work and, crucially, to check work its parent produced.
Worktree
A separate working copy of a git repository on its own branch. Gives each parallel agent its own checkout so edits never collide.
Skill
A folder of written instructions (a SKILL.md plus optional scripts) the agent loads by name or by matching the task — project knowledge written once instead of re-explained every session.
Connector / MCP
A standard interface (Model Context Protocol) that lets an agent reach real tools: issue trackers, databases, Slack, staging APIs.
Dynamic workflow
A JavaScript orchestration script Claude writes for one large task, executed by a runtime that fans out many clean-context subagents in a guaranteed order. Structure within a run, where a loop is recurrence across runs.
Anchor file
A file read fresh at every iteration — VISION.md, CLAUDE.md, PROMPT.md — holding the intent and rules a cold-started agent would otherwise guess.
Compaction
The harness summarizing older conversation history to free context space. Lossy: instructions that must survive belong in anchor files, not the opening prompt.
Rubric
A written, falsifiable standard a verifier scores work against, used where the environment can't grade automatically.
Ralph loop
Geoffrey Huntley's minimal pattern: a bash loop piping a fixed prompt into the agent forever, with all progress kept on disk and in git.
Effort
A per-call dial for how much reasoning the model applies — low for routine errands, high for verification and hard problems.
Hook
A callback in your own process that fires at fixed points in the loop (before a tool runs, when the agent stops). Hooks enforce; prompts merely steer.
Sources

Addy Osmani, Loop Engineering (June 7, 2026) · Yash Thakker, Loop Engineering: 2026 Guide (June 9, 2026) · Anthropic, How the agent loop works and Prompting Claude Fable 5 · Geoffrey Huntley, Ralph (July 2025) · Yao et al., ReAct (2022).

An unofficial study text. Commentary synthesized in original wording; templates and commands are functional snippets from the cited sources. Researched and drafted by Urania, an AI research system — edited, verified, and signed by Zach Rossmiller.