What a skill is
Reading Anthropic, Equipping agents for the real world with Agent Skills; the Agent Skills specification and the standard's repository.
1.1A folder, a file, one job
A skill is a directory with one required file in it: SKILL.md. That file has two parts — a short block of metadata at the top (in YAML, between two lines of three dashes) and a Markdown body underneath with the actual instructions. Everything else in the folder is optional: a scripts/ directory for code the agent can run, a references/ directory for longer documentation, an assets/ directory for templates or data. That is the whole structure, and it is the same whether the skill teaches legal-review steps, a data-cleaning pipeline, or how to format a slide deck.
The framing the standard's authors keep returning to is onboarding a capable new hire. A new colleague does not need you to re-explain the company from scratch; they need a written guide to the specific way you do a specific thing, and they pull it out when that thing comes up. A skill is that guide, made machine-readable.
1.2The lineage: a feature, then a standard
Anthropic introduced Agent Skills in October 2025 as a way to give Claude repeatable, packaged expertise. The more consequential move came on December 18, 2025, when the company published the SKILL.md format as an open standard at agentskills.io and put the specification and a validator in a public repository — the same playbook that turned the Model Context Protocol into shared infrastructure rather than one vendor's API. Within the same day, GitHub Copilot shipped support; OpenAI's Codex and Google's Gemini CLI followed in the weeks after.
That history is why this guide treats skills as a format, not a Claude feature. A skill you write to the spec is portable by design: spec-compliant tools ignore frontmatter keys they don't recognize, so the file degrades gracefully rather than breaking. The caution worth carrying is that portability is about structure, not identical behavior — where a skill's folder lives, how reliably it triggers, and what a script is allowed to do differ from tool to tool. The phrase practitioners use is a shared language, not identical speech.
1.3Where it runs
The discovery locations and invocation styles vary, but the SKILL.md inside is the same file. A useful tell of how serious the interoperability is: Copilot will automatically pick up skills you already placed in a .claude/skills/ directory, and several tools honor a neutral .agents/skills/ path precisely so one folder can serve many agents.
| Tool | Reads skills from | Invocation |
|---|---|---|
| Claude Code | .claude/skills/, ~/.claude/skills/ | implicit on match |
| OpenAI Codex | .agents/skills/, ~/.codex/skills/ | implicit, or $skill |
| Gemini CLI | .gemini/skills/, .agents/skills/ | activate_skill, with consent |
| GitHub Copilot | .github/skills/, also reads .claude/skills/ | implicit; /skills |
| VS Code, Cursor | repo and per-user skill directories | implicit on match |
Name one task you re-explain to an AI tool more than once a week. Is the thing you keep repeating what you want (a one-off ask), how to do it your way (a procedure), or access to some live system? Hold your answer — only the middle one is a skill, and Chapter 5 is about telling them apart cleanly.
Progressive disclosure
Reading The Agent Skills specification, the loading model; Anthropic, Equipping agents for the real world with Agent Skills; Gao et al., SkillReducer (arXiv 2026) on skills at scale.
2.1The three tiers
Progressive disclosure is the idea that makes skills scale, and the specification defines it in three stages. At startup, the agent loads only each skill's name and description — roughly a hundred tokens apiece. That tiny listing is all it holds for every installed skill at once. When a task matches a description, the agent reads that one skill's full SKILL.md body into context; the spec recommends keeping the body under about 5,000 tokens and 500 lines. Only if the instructions then point to something deeper — a file in references/, a script in scripts/ — does that content load, and a script contributes only its output, not its source, to the context.
The analogy in Anthropic's write-up is a well-organized manual: a table of contents first, then the relevant chapter, then the appendix only if you need it. The practical consequence is large. Because the deep content stays on disk until summoned, the total material a skill can bundle is effectively unbounded, while its cost when idle stays near zero.
2.2The economics, and why lean wins
This is the difference from a long system prompt, which pays its full token cost on every turn whether or not it is relevant. With skills, the agent can be aware of a large library for a small, fixed cost and pay for depth only on use.
There is now measurement behind the design. A 2026 study analyzed 55,315 publicly available skills and found the libraries already sprawling: about a quarter had no usable routing description at all, and more than half of body content was non-actionable filler. When the authors compressed descriptions and restructured bodies — pushing detail down into on-demand references — they cut body size by around 39% on average and improved task quality by about 2.8%. They named it a less-is-more effect: trimming non-essential content removes distraction from the context window. The lesson for an author is direct. A lean SKILL.md is not just cheaper; it often works better, because every token in the body competes for the model's attention.
In your own words, where do the instructions for a skill physically live when the agent is idle, when a matching task arrives, and when the instructions cite a reference file? If you can place each in the right tier, you understand the one idea this whole format is built on.
Writing the SKILL.md
Reading The Agent Skills specification, frontmatter fields; Anthropic's skill-creator on writing descriptions.
3.1The frontmatter, and the only two required fields
Two fields are mandatory: name and description. The name must be at most 64 characters, lowercase letters, numbers, and hyphens only, and it must match the folder name — mismatch and the skill silently fails to load. The description must be at most 1,024 characters and must say two things: what the skill does and when to use it. Optional fields exist — license, metadata, a compatibility block, an experimental allowed-tools — but most skills need none of them. One quiet trap: avoid angle brackets in the frontmatter, since < and > can be read as stray instructions.
Here is the first cut of our build-along skill, a writer for house-style memos. Notice how little the body needs to be.
--- name: house-style-memo description: >- Draft internal memos in the house voice and structure. Use whenever someone asks to write, draft, format, or clean up a memo, announcement, or internal update — prefer this over a generic draft every time. --- # House-Style Memo Write the memo in the organization's house style, not a generic one. ## Structure 1. Subject line in sentence case. 2. A two-sentence summary first — the decision, then the reason. 3. Body under short, sentence-case headings. 4. A "next steps" list with one owner per item. ## Voice - Plain, direct, active. No throat-clearing, no filler openers. - For anything longer than a page, read references/voice.md first.
3.2The description is the whole ballgame
At startup the agent sees only names and descriptions — the body that does the real work is invisible until the description earns a match. So the description is the highest-leverage text in the file, and the most common reason a perfectly good skill never fires is a vague description, not weak instructions. Most authors over-invest in the body and under-invest in the one line that controls triggering.
A good description names the artifacts and the moments. "Formats text" is dead on arrival; "Draft internal memos in the house voice; use when someone asks to write, draft, or format a memo, announcement, or internal update" gives the agent concrete hooks. Put all the "when to use" guidance here, not in the body — the body isn't read until after the decision is made. Anthropic's own skill-creator goes a step further and advises writing descriptions a little bit "pushy," because models tend to undertrigger skills — to skip a skill that would have helped. Erring toward eager triggering, then narrowing if it fires too often, is the right direction to be wrong in.
Write the description line for a skill you'd actually use. Then hand it to a colleague with no other context and ask: from this one sentence, can you say exactly what it does and name three requests that should trigger it? If they hesitate on either, the description — not your future instructions — is what needs the work.
Bundling depth
Reading The Agent Skills specification, resources and the 500-line guidance; Li et al., SkillsBench (arXiv 2026), on focused versus bloated skills.
4.1Three kinds of extra, and what each is for
Once the body is lean, depth goes into the subdirectories. The three conventional ones do different jobs. references/ holds documentation the agent reads only when the body points it there — the full style guide, an API's edge cases, a domain glossary; keep each reference file small and focused, since the agent loads whole files. assets/ holds things the agent uses but doesn't read for meaning — a template to copy, a logo, a data file. scripts/ holds code the agent runs for anything deterministic: a calculation, a format conversion, a validation pass. The win with a script is twofold — it is reliable in a way prose instructions never are, and only its output enters the context, not its source.
house-style-memo/
├── SKILL.md # structure + voice rules, kept lean
├── references/
│ └── voice.md # the full style guide — loaded only when cited
├── assets/
│ └── memo-template.md # a skeleton the agent copies to start
└── scripts/
└── lint_memo.py # checks heading case + section order; prints fixesThe body's job becomes pointing at this depth at the right moment, rather than containing it:
## When the memo is long or formal Read references/voice.md for the full guide before drafting, then start from assets/memo-template.md. After drafting, run scripts/lint_memo.py on the file and apply what it reports.
4.2Split for focus, not just for size
The spec's rule of thumb — keep SKILL.md under 500 lines and push the rest into references — is partly about the token budget, but the deeper reason is attention. The benchmarking work on skills (SkillsBench) found that focused skills of two or three modules consistently outperform one comprehensive document, and that skills a model generates for itself, with no human curation, gave no benefit on average — the value is in specific, human-grounded procedure. So the instinct to cram everything into one comprehensive file is exactly backwards. Split when a section is only relevant sometimes (it becomes a conditional reference), when it serves a distinct sub-domain (it becomes its own file), or when it is long enough to dilute the rules that always apply.
The maker/checker discipline from loop engineering applies here in miniature: a scripts/ validator that can fail the draft is something in the skill that can say no. Prose that merely asks the agent to "double-check the headings" is a suggestion; a script that checks them is a gate.
Take the skill you've been sketching and draw its folder. For each rule in your head, decide: does it apply every time (it belongs in the body), only sometimes (a reference file), or deterministically checkable (a script)? If everything lands in the body, you haven't yet found the skill's shape — something almost always wants to move down a tier.
Skill, prompt, or server?
Reading Synthesis across the platform docs (Codex, Gemini CLI, GitHub); practitioner framing in Skills vs MCP (LlamaIndex).
5.1The question is what's missing
Skills, prompts, tools, connectors, and subagents get argued about as rivals; they are layers. The clean way to choose is to ask one question about the gap in front of you: what does the agent lack? If it lacks know-how — a procedure, a standard, the right way to do a recurring job — that is a skill. If it lacks a one-time instruction for a one-off, that is just a prompt. If it lacks the ability to do one discrete action, that is a tool. If it lacks live access to a system or current data, that is a connector, most often an MCP server. If it lacks a clean, separate context for parallel or isolated work, that is a subagent.
5.2Skill versus prompt, and skill versus server
The line that trips people most is skill versus prompt, because a skill is instructions — it can look like a prompt you saved. The differences are real and they are the reason to bother. A prompt is paid for in full every time and lives only in the conversation; a skill loads progressively and persists as a versioned file. A prompt has no triggering mechanism; a skill activates itself off its description. A prompt is plain text; a skill can bundle scripts and references and run them. And a prompt is bound to one chat in one tool; a skill written to the open format travels. If the thing recurs and has a right way to do it, it has outgrown being a prompt.
Skill versus server is the easier line: a connector gives access, a skill gives technique. MCP lets the agent reach your calendar; a skill tells it how your team books a meeting. They are complementary, and the strongest setups use both — a skill that, partway through, instructs the agent to call an MCP tool. So the honest answer to "skill or server?" is usually "the skill describes the how, the server provides the reach." A full mechanism-by-mechanism comparison is a subject of its own; for choosing, the missing-piece question is enough.
| Mechanism | What it adds | Reach for it when | Where it lives |
|---|---|---|---|
| Prompt | a one-time instruction | the need is a genuine one-off | the conversation |
| Skill | the how: procedure and standards | the task recurs and has a right way | a portable folder |
| Tool | a single action | the agent must do one discrete thing | the harness |
| MCP / connector | live access to a system | the agent needs current data or to act | a server |
| Subagent | a fresh, isolated context | work needs parallelism or isolation | a spawned agent |
Take three things you might want an agent to do — say, "summarize this thread," "always write our release notes this way," and "check our live deploy status." Route each with the missing-piece question. If you can name prompt, skill, and connector respectively and say why, you have the chapter.
Authoring well and shipping
Reading Anthropic's skill-creator and the Skill Creator plugin; GitHub, managing skills with the CLI and its install warning; security research from Snyk, Silverfort, and Koi via The Hacker News.
6.1Test the trigger first, then the work
A skill has two failure modes, and they need separate tests. The first is that it never fires — a description problem. The second is that it fires and does the job badly — a body problem. Test them in that order. To check triggering, describe the task the way a user actually would, without naming the skill, and confirm it activates; to check the work, force it explicitly and read the output. The cheapest reliable loop is to iterate on one genuinely hard example until the skill nails it, then generalize from what worked.
# 1 · force the trigger once, by naming the skill: Use the house-style-memo skill to draft a memo on the Q3 office move. # 2 · then prove it fires on its own — describe the task, don't name it: Draft a short internal note telling the team about the Q3 office move. # 3 · validate the file against the open spec before sharing: skills-ref validate ./house-style-memo # 4 · install it for another agent, or publish for the team: gh skill install your-org/skills house-style-memo
For anything you'll share widely, the work is worth automating. Anthropic's skill-creator is itself a skill, with four modes — Create, Eval, Improve, and Benchmark — backed by small agents that run the skill against example prompts, grade the outputs, compare two versions blind, and suggest fixes. That turns "it feels better now" into a measured before-and-after, which is the only honest way to know a revision helped.
6.2Distribution, and the trust it demands
Skills are shared the way code is — repositories, registries, an install command. GitHub's gh skill (in version 2.90.0 and later) discovers, installs, validates, and publishes skills across hosts, and writes provenance metadata into the frontmatter so updates can be checked against the source. That convenience is also the risk. A skill is loaded automatically and can run scripts with your permissions; installing one from a stranger is closer to running their program than reading their note. GitHub says it plainly: skills are not verified and may contain prompt injections, hidden instructions, or malicious scripts.
The cautionary receipt arrived fast. ClawHub, the open marketplace for the OpenClaw assistant, became the year's clearest supply-chain lesson. In an early-February 2026 audit, Koi Security found 341 of 2,857 listed skills malicious — about one in eight — with most belonging to a single campaign that disguised credential-stealing malware as ordinary trading and utility skills. A separate Snyk scan of nearly 4,000 skills found about 7% of them leaking secrets through the agent's context — instructions that have the model handle API keys or card numbers in plaintext, so its own logs become the leak. Silverfort then showed the trust signals themselves could be gamed: by exploiting the ranking system, researchers pushed a planted skill to the top of its category, where it was executed thousands of times across dozens of cities within a week. The registry's defenses — account-age limits, community reporting, malware scanning — came after the damage, which is the usual order.
The mitigations are not exotic, just disciplined: read the full SKILL.md and any scripts before installing, prefer curated or private sources over open ones, pin a version rather than tracking a moving target, run a validator, and keep credentials out of the environments where untrusted skills can run. The same instinct that makes a good skill author — keep it scoped, write down exactly what it does — is what makes a safe skill consumer.
6.3What you can now do
You can write a SKILL.md that triggers, knows why progressive disclosure rewards a lean body, can split depth into references and scripts, and can decide — by asking what the agent actually lacks — when a skill is the right tool and when a prompt or a server is. The build-along skill is small on purpose; the discipline scales straight up to a legal-review skill, a data-pipeline skill, a release-notes skill. The format is the same folder either way, and now it is yours, not a vendor's.
Finish your skill. Write the lean SKILL.md with a sharp, slightly-pushy description; move one rule into references/ and one check into scripts/; test triggering by describing the task without naming the skill; then validate it and decide where it should live. Last, write one sentence on the trust question: if a teammate installed this from you, what would you want them to inspect first?