Learn/Skills, on Their Own
Governance

Skills, on Their OwnJune 2026

How to write a SKILL.md an agent will actually use — progressive disclosure, the description that does the triggering, and when a skill beats a prompt or a server.

Agent Skills began as an Anthropic feature in October 2025 and became an open standard that December, so this guide is deliberately not Claude-only: the same SKILL.md file now runs, unmodified, across Claude Code, OpenAI's Codex, Google's Gemini CLI, and GitHub Copilot. It draws on the published specification at agentskills.io, Anthropic's engineering write-up and its skill-creator, the platform documentation from OpenAI, Google, and GitHub, recent measurement of skills at scale, and the 2026 security research into skill supply chains. Every file and command here is a working snippet you can adapt. You'll build one small skill — a house-style memo writer — from an empty folder, and leave able to decide when a skill is the wrong tool.

In Plain Terms

If you have ever pasted the same long instructions into an AI tool for the third time — your team's report format, the way you like commit messages written, the rules for a kind of document you make often — you have already felt the problem skills solve. The instructions are good. Re-typing them every time, and hoping you remembered all of them, is the waste.

A skill is those instructions, written down once, in a folder. At the center of the folder is a single file called SKILL.md: a short header that says what the skill is for, followed by the instructions themselves. You put the folder where your AI tool can see it, and from then on the tool reads those instructions on its own, but only when the task in front of it actually calls for them.

That last part is the clever bit, and it has a name: progressive disclosure. The tool does not hold every skill's full instructions in its head at all times — that would crowd out everything else. Instead it keeps only a one-line summary of each skill on hand, like the spines of books on a shelf. When a request comes in, it glances at the spines, pulls down only the book it needs, and reads further into that book only as the job requires. So you can have dozens of skills available and pay almost nothing for the ones you are not using.

A skill is not the only way to extend an AI tool, and knowing the difference is half of using them well. A plain prompt is a one-time instruction that vanishes when the chat ends. A connector (the common kind is called MCP) gives the tool live access to a system — your calendar, a database, a ticket tracker — so it can fetch current information or take an action. A skill is neither of those: it is know-how. It tells the tool how to do a job the way you want it done, every time, and it can carry along scripts and reference files to help. The rule of thumb: reach for a skill when the same task keeps coming back and there is a right way to do it.

Two things make skills more than a tidy filing system. First, the format is an open standard, not one company's feature, so a skill you write once works across many different tools — you own the file, rather than renting the capability inside one vendor's product. Second, a skill can carry runnable code and is loaded automatically, which is powerful and is also the catch: installing a stranger's skill is closer to running a stranger's program than to reading a note. The last chapter is about taking that seriously.

Chapter One

What a skill is


Reading Anthropic, Equipping agents for the real world with Agent Skills; the Agent Skills specification and the standard's repository.

1.1A folder, a file, one job

A skill is a directory with one required file in it: SKILL.md. That file has two parts — a short block of metadata at the top (in YAML, between two lines of three dashes) and a Markdown body underneath with the actual instructions. Everything else in the folder is optional: a scripts/ directory for code the agent can run, a references/ directory for longer documentation, an assets/ directory for templates or data. That is the whole structure, and it is the same whether the skill teaches legal-review steps, a data-cleaning pipeline, or how to format a slide deck.

The framing the standard's authors keep returning to is onboarding a capable new hire. A new colleague does not need you to re-explain the company from scratch; they need a written guide to the specific way you do a specific thing, and they pull it out when that thing comes up. A skill is that guide, made machine-readable.

1.2The lineage: a feature, then a standard

Anthropic introduced Agent Skills in October 2025 as a way to give Claude repeatable, packaged expertise. The more consequential move came on December 18, 2025, when the company published the SKILL.md format as an open standard at agentskills.io and put the specification and a validator in a public repository — the same playbook that turned the Model Context Protocol into shared infrastructure rather than one vendor's API. Within the same day, GitHub Copilot shipped support; OpenAI's Codex and Google's Gemini CLI followed in the weeks after.

That history is why this guide treats skills as a format, not a Claude feature. A skill you write to the spec is portable by design: spec-compliant tools ignore frontmatter keys they don't recognize, so the file degrades gracefully rather than breaking. The caution worth carrying is that portability is about structure, not identical behavior — where a skill's folder lives, how reliably it triggers, and what a script is allowed to do differ from tool to tool. The phrase practitioners use is a shared language, not identical speech.

1.3Where it runs

The discovery locations and invocation styles vary, but the SKILL.md inside is the same file. A useful tell of how serious the interoperability is: Copilot will automatically pick up skills you already placed in a .claude/skills/ directory, and several tools honor a neutral .agents/skills/ path precisely so one folder can serve many agents.

ToolReads skills fromInvocation
Claude Code.claude/skills/, ~/.claude/skills/implicit on match
OpenAI Codex.agents/skills/, ~/.codex/skills/implicit, or $skill
Gemini CLI.gemini/skills/, .agents/skills/activate_skill, with consent
GitHub Copilot.github/skills/, also reads .claude/skills/implicit; /skills
VS Code, Cursorrepo and per-user skill directoriesimplicit on match
Check your understanding

Name one task you re-explain to an AI tool more than once a week. Is the thing you keep repeating what you want (a one-off ask), how to do it your way (a procedure), or access to some live system? Hold your answer — only the middle one is a skill, and Chapter 5 is about telling them apart cleanly.

Chapter Two

Progressive disclosure


Reading The Agent Skills specification, the loading model; Anthropic, Equipping agents for the real world with Agent Skills; Gao et al., SkillReducer (arXiv 2026) on skills at scale.

2.1The three tiers

Progressive disclosure is the idea that makes skills scale, and the specification defines it in three stages. At startup, the agent loads only each skill's name and description — roughly a hundred tokens apiece. That tiny listing is all it holds for every installed skill at once. When a task matches a description, the agent reads that one skill's full SKILL.md body into context; the spec recommends keeping the body under about 5,000 tokens and 500 lines. Only if the instructions then point to something deeper — a file in references/, a script in scripts/ — does that content load, and a script contributes only its output, not its source, to the context.

The analogy in Anthropic's write-up is a well-organized manual: a table of contents first, then the relevant chapter, then the appendix only if you need it. The practical consequence is large. Because the deep content stays on disk until summoned, the total material a skill can bundle is effectively unbounded, while its cost when idle stays near zero.

2.2The economics, and why lean wins

This is the difference from a long system prompt, which pays its full token cost on every turn whether or not it is relevant. With skills, the agent can be aware of a large library for a small, fixed cost and pay for depth only on use.

There is now measurement behind the design. A 2026 study analyzed 55,315 publicly available skills and found the libraries already sprawling: about a quarter had no usable routing description at all, and more than half of body content was non-actionable filler. When the authors compressed descriptions and restructured bodies — pushing detail down into on-demand references — they cut body size by around 39% on average and improved task quality by about 2.8%. They named it a less-is-more effect: trimming non-essential content removes distraction from the context window. The lesson for an author is direct. A lean SKILL.md is not just cheaper; it often works better, because every token in the body competes for the model's attention.

Check your understanding

In your own words, where do the instructions for a skill physically live when the agent is idle, when a matching task arrives, and when the instructions cite a reference file? If you can place each in the right tier, you understand the one idea this whole format is built on.

Chapter Three

Writing the SKILL.md


Reading The Agent Skills specification, frontmatter fields; Anthropic's skill-creator on writing descriptions.

3.1The frontmatter, and the only two required fields

Two fields are mandatory: name and description. The name must be at most 64 characters, lowercase letters, numbers, and hyphens only, and it must match the folder name — mismatch and the skill silently fails to load. The description must be at most 1,024 characters and must say two things: what the skill does and when to use it. Optional fields exist — license, metadata, a compatibility block, an experimental allowed-tools — but most skills need none of them. One quiet trap: avoid angle brackets in the frontmatter, since < and > can be read as stray instructions.

Here is the first cut of our build-along skill, a writer for house-style memos. Notice how little the body needs to be.

---
name: house-style-memo
description: >-
  Draft internal memos in the house voice and structure. Use whenever
  someone asks to write, draft, format, or clean up a memo, announcement,
  or internal update — prefer this over a generic draft every time.
---

# House-Style Memo

Write the memo in the organization's house style, not a generic one.

## Structure
1. Subject line in sentence case.
2. A two-sentence summary first — the decision, then the reason.
3. Body under short, sentence-case headings.
4. A "next steps" list with one owner per item.

## Voice
- Plain, direct, active. No throat-clearing, no filler openers.
- For anything longer than a page, read references/voice.md first.
house-style-memo/SKILL.md — the first cut. Name matches the folder; the description carries the trigger.

3.2The description is the whole ballgame

At startup the agent sees only names and descriptions — the body that does the real work is invisible until the description earns a match. So the description is the highest-leverage text in the file, and the most common reason a perfectly good skill never fires is a vague description, not weak instructions. Most authors over-invest in the body and under-invest in the one line that controls triggering.

A good description names the artifacts and the moments. "Formats text" is dead on arrival; "Draft internal memos in the house voice; use when someone asks to write, draft, or format a memo, announcement, or internal update" gives the agent concrete hooks. Put all the "when to use" guidance here, not in the body — the body isn't read until after the decision is made. Anthropic's own skill-creator goes a step further and advises writing descriptions a little bit "pushy," because models tend to undertrigger skills — to skip a skill that would have helped. Erring toward eager triggering, then narrowing if it fires too often, is the right direction to be wrong in.

Practice

Write the description line for a skill you'd actually use. Then hand it to a colleague with no other context and ask: from this one sentence, can you say exactly what it does and name three requests that should trigger it? If they hesitate on either, the description — not your future instructions — is what needs the work.

Chapter Four

Bundling depth


Reading The Agent Skills specification, resources and the 500-line guidance; Li et al., SkillsBench (arXiv 2026), on focused versus bloated skills.

4.1Three kinds of extra, and what each is for

Once the body is lean, depth goes into the subdirectories. The three conventional ones do different jobs. references/ holds documentation the agent reads only when the body points it there — the full style guide, an API's edge cases, a domain glossary; keep each reference file small and focused, since the agent loads whole files. assets/ holds things the agent uses but doesn't read for meaning — a template to copy, a logo, a data file. scripts/ holds code the agent runs for anything deterministic: a calculation, a format conversion, a validation pass. The win with a script is twofold — it is reliable in a way prose instructions never are, and only its output enters the context, not its source.

house-style-memo/
├── SKILL.md             # structure + voice rules, kept lean
├── references/
│   └── voice.md         # the full style guide — loaded only when cited
├── assets/
│   └── memo-template.md # a skeleton the agent copies to start
└── scripts/
    └── lint_memo.py     # checks heading case + section order; prints fixes
The same skill, grown. The body stays small; depth moves into folders loaded on demand.

The body's job becomes pointing at this depth at the right moment, rather than containing it:

## When the memo is long or formal
Read references/voice.md for the full guide before drafting, then
start from assets/memo-template.md. After drafting, run
scripts/lint_memo.py on the file and apply what it reports.
An excerpt from the body that reaches for depth conditionally.

4.2Split for focus, not just for size

The spec's rule of thumb — keep SKILL.md under 500 lines and push the rest into references — is partly about the token budget, but the deeper reason is attention. The benchmarking work on skills (SkillsBench) found that focused skills of two or three modules consistently outperform one comprehensive document, and that skills a model generates for itself, with no human curation, gave no benefit on average — the value is in specific, human-grounded procedure. So the instinct to cram everything into one comprehensive file is exactly backwards. Split when a section is only relevant sometimes (it becomes a conditional reference), when it serves a distinct sub-domain (it becomes its own file), or when it is long enough to dilute the rules that always apply.

The maker/checker discipline from loop engineering applies here in miniature: a scripts/ validator that can fail the draft is something in the skill that can say no. Prose that merely asks the agent to "double-check the headings" is a suggestion; a script that checks them is a gate.

Practice

Take the skill you've been sketching and draw its folder. For each rule in your head, decide: does it apply every time (it belongs in the body), only sometimes (a reference file), or deterministically checkable (a script)? If everything lands in the body, you haven't yet found the skill's shape — something almost always wants to move down a tier.

Chapter Five

Skill, prompt, or server?


Reading Synthesis across the platform docs (Codex, Gemini CLI, GitHub); practitioner framing in Skills vs MCP (LlamaIndex).

5.1The question is what's missing

Skills, prompts, tools, connectors, and subagents get argued about as rivals; they are layers. The clean way to choose is to ask one question about the gap in front of you: what does the agent lack? If it lacks know-how — a procedure, a standard, the right way to do a recurring job — that is a skill. If it lacks a one-time instruction for a one-off, that is just a prompt. If it lacks the ability to do one discrete action, that is a tool. If it lacks live access to a system or current data, that is a connector, most often an MCP server. If it lacks a clean, separate context for parallel or isolated work, that is a subagent.

5.2Skill versus prompt, and skill versus server

The line that trips people most is skill versus prompt, because a skill is instructions — it can look like a prompt you saved. The differences are real and they are the reason to bother. A prompt is paid for in full every time and lives only in the conversation; a skill loads progressively and persists as a versioned file. A prompt has no triggering mechanism; a skill activates itself off its description. A prompt is plain text; a skill can bundle scripts and references and run them. And a prompt is bound to one chat in one tool; a skill written to the open format travels. If the thing recurs and has a right way to do it, it has outgrown being a prompt.

Skill versus server is the easier line: a connector gives access, a skill gives technique. MCP lets the agent reach your calendar; a skill tells it how your team books a meeting. They are complementary, and the strongest setups use both — a skill that, partway through, instructs the agent to call an MCP tool. So the honest answer to "skill or server?" is usually "the skill describes the how, the server provides the reach." A full mechanism-by-mechanism comparison is a subject of its own; for choosing, the missing-piece question is enough.

MechanismWhat it addsReach for it whenWhere it lives
Prompta one-time instructionthe need is a genuine one-offthe conversation
Skillthe how: procedure and standardsthe task recurs and has a right waya portable folder
Toola single actionthe agent must do one discrete thingthe harness
MCP / connectorlive access to a systemthe agent needs current data or to acta server
Subagenta fresh, isolated contextwork needs parallelism or isolationa spawned agent
Check your understanding

Take three things you might want an agent to do — say, "summarize this thread," "always write our release notes this way," and "check our live deploy status." Route each with the missing-piece question. If you can name prompt, skill, and connector respectively and say why, you have the chapter.

Chapter Six

Authoring well and shipping


Reading Anthropic's skill-creator and the Skill Creator plugin; GitHub, managing skills with the CLI and its install warning; security research from Snyk, Silverfort, and Koi via The Hacker News.

6.1Test the trigger first, then the work

A skill has two failure modes, and they need separate tests. The first is that it never fires — a description problem. The second is that it fires and does the job badly — a body problem. Test them in that order. To check triggering, describe the task the way a user actually would, without naming the skill, and confirm it activates; to check the work, force it explicitly and read the output. The cheapest reliable loop is to iterate on one genuinely hard example until the skill nails it, then generalize from what worked.

# 1 · force the trigger once, by naming the skill:
Use the house-style-memo skill to draft a memo on the Q3 office move.

# 2 · then prove it fires on its own — describe the task, don't name it:
Draft a short internal note telling the team about the Q3 office move.

# 3 · validate the file against the open spec before sharing:
skills-ref validate ./house-style-memo

# 4 · install it for another agent, or publish for the team:
gh skill install your-org/skills house-style-memo
A four-step shake-out for the build-along skill, from explicit trigger to publish.

For anything you'll share widely, the work is worth automating. Anthropic's skill-creator is itself a skill, with four modes — Create, Eval, Improve, and Benchmark — backed by small agents that run the skill against example prompts, grade the outputs, compare two versions blind, and suggest fixes. That turns "it feels better now" into a measured before-and-after, which is the only honest way to know a revision helped.

6.2Distribution, and the trust it demands

Skills are shared the way code is — repositories, registries, an install command. GitHub's gh skill (in version 2.90.0 and later) discovers, installs, validates, and publishes skills across hosts, and writes provenance metadata into the frontmatter so updates can be checked against the source. That convenience is also the risk. A skill is loaded automatically and can run scripts with your permissions; installing one from a stranger is closer to running their program than reading their note. GitHub says it plainly: skills are not verified and may contain prompt injections, hidden instructions, or malicious scripts.

The cautionary receipt arrived fast. ClawHub, the open marketplace for the OpenClaw assistant, became the year's clearest supply-chain lesson. In an early-February 2026 audit, Koi Security found 341 of 2,857 listed skills malicious — about one in eight — with most belonging to a single campaign that disguised credential-stealing malware as ordinary trading and utility skills. A separate Snyk scan of nearly 4,000 skills found about 7% of them leaking secrets through the agent's context — instructions that have the model handle API keys or card numbers in plaintext, so its own logs become the leak. Silverfort then showed the trust signals themselves could be gamed: by exploiting the ranking system, researchers pushed a planted skill to the top of its category, where it was executed thousands of times across dozens of cities within a week. The registry's defenses — account-age limits, community reporting, malware scanning — came after the damage, which is the usual order.

The mitigations are not exotic, just disciplined: read the full SKILL.md and any scripts before installing, prefer curated or private sources over open ones, pin a version rather than tracking a moving target, run a validator, and keep credentials out of the environments where untrusted skills can run. The same instinct that makes a good skill author — keep it scoped, write down exactly what it does — is what makes a safe skill consumer.

6.3What you can now do

You can write a SKILL.md that triggers, knows why progressive disclosure rewards a lean body, can split depth into references and scripts, and can decide — by asking what the agent actually lacks — when a skill is the right tool and when a prompt or a server is. The build-along skill is small on purpose; the discipline scales straight up to a legal-review skill, a data-pipeline skill, a release-notes skill. The format is the same folder either way, and now it is yours, not a vendor's.

Practice · put it together

Finish your skill. Write the lean SKILL.md with a sharp, slightly-pushy description; move one rule into references/ and one check into scripts/; test triggering by describing the task without naming the skill; then validate it and decide where it should live. Last, write one sentence on the trust question: if a teammate installed this from you, what would you want them to inspect first?

Reference

Glossary


Agent Skill
A folder of instructions, and optional scripts and resources, that an agent loads on demand to do a specific task in a repeatable way.
SKILL.md
The one required file in a skill: YAML frontmatter (metadata) followed by a Markdown body of instructions.
Progressive disclosure
The three-tier loading model — metadata always, body on a match, resources on demand — that lets an agent keep many skills ready at near-zero idle cost.
Description
The frontmatter line that states what a skill does and when to use it. The only body-external text the agent sees, so it alone decides whether a skill triggers.
Reference file
A document in references/ the agent reads only when the body points to it — depth kept out of the always-loaded body.
Bundled script
Code in scripts/ the agent runs for deterministic work; only its output, not its source, enters the context.
Open standard
The vendor-neutral SKILL.md specification published at agentskills.io in December 2025, so one skill works across many agents.
MCP
The Model Context Protocol: connectors that give an agent live access to systems and data. Complementary to skills — access, where a skill is technique.
Undertriggering
An agent failing to use a relevant skill, usually because the description is too vague; the reason authors are advised to write descriptions assertively.
Skill registry
A marketplace or repository for discovering and installing shared skills. Convenient, and a supply-chain risk when uploads are unvetted.
Prompt injection
Hidden instructions planted in a skill's text or scripts that hijack the agent on load — the central threat in skill distribution.
Sources

Anthropic, Equipping agents for the real world with Agent Skills (October 2025) and the public skills repository with its skill-creator and Skill Creator plugin. · The Agent Skills specification and standard repository (open standard, December 18, 2025). · Platform documentation: OpenAI Codex, Gemini CLI, GitHub Copilot with its launch and gh skill changelogs, and VS Code. · Measurement: Gao et al., SkillReducer: Optimizing LLM Agent Skills for Token Efficiency (arXiv 2026), and Li et al., SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks (arXiv 2026). · Security: Snyk on credential-leaking skills, Silverfort on ranking manipulation, and Koi Security's ClawHub audit via The Hacker News. · Practitioner framing on skills versus MCP: LlamaIndex.

An unofficial study text. Commentary synthesized in original wording; the SKILL.md, directory, and command snippets are functional and adapted from the cited documentation. Don't cite Urania or this guide — cite the primary sources above. Researched and drafted by Urania, an AI research system; edited, verified, and signed by Zach Rossmiller, who is accountable for what's published.