Files Over Platforms

Chapter One

The wiki that maintains itself

Reading Karpathy, LLM Wiki (April 2026); McVeety & Hormati, Introducing the Open Knowledge Format (Google Cloud, June 2026), the fragmentation and living-wiki sections.

1.1The knowledge an agent needs is the knowledge it can't reach

A foundation model carries a great deal of general competence and almost none of your specifics. It can write SQL but does not know that your orders table excludes refunds, or that "active user" means something particular on your team, or which of two pipelines is the source of truth after last quarter's migration. That missing layer — the metadata, the context, the curated institutional memory — is what determines whether an agent's answer is right or confidently wrong.

Google's announcement names where that layer actually lives in most organizations: in metadata catalogs each with their own interface, in wikis and shared drives, in code comments and docstrings, and in the heads of a few senior engineers. When an agent needs to answer a question, it has to assemble the pieces from those mutually incompatible surfaces. Every agent builder solves the same assembly problem from scratch, every catalog vendor reinvents the same data model, and the knowledge stays locked behind whichever tool created it.

1.2Karpathy's insight: an LLM will do the bookkeeping you won't

Earlier in 2026, Andrej Karpathy described the pattern that OKF formalizes. The drag on a knowledge base, he observed, isn't the reading or the thinking — it's the upkeep: repairing cross-references, refreshing summaries, flagging where new information cuts against the old. That upkeep is why personal wikis get abandoned; it eventually outgrows what the wiki gives back. His pivot is that LLMs don't get bored, don't forget to update a cross-reference, and can revise many files in a single pass — so the upkeep that kills a human-maintained wiki costs an agent almost nothing, and the wiki stays current.

In his framing the wiki is a standing artifact that compounds as sources accrue, rather than something the model rebuilds from scratch at every question. Three layers do the work: an immutable pile of raw sources the agent only reads; the wiki itself, a directory of Markdown files the agent owns and maintains; and a schema file — a CLAUDE.md or AGENTS.md — that tells the agent how the wiki is organized and how to maintain it. The human curates sources and asks good questions; the agent does the filing. Karpathy reaches back to Vannevar Bush's 1945 Memex for the lineage — a private, curated store of documents with trails between them — and notes that the part Bush could never solve was who does the maintenance. The LLM is that missing maintainer.

Figure 1. Why the pattern arrives now. By hand, the upkeep of a knowledge base outgrows its value and people abandon it. An LLM does the same bookkeeping in a loop at near-zero cost, so the wiki stays current.

1.3What was missing was a format, not another service

The pattern kept reappearing under different names — Obsidian vaults wired to coding agents, the AGENTS.md and CLAUDE.md convention files, repositories full of index.md and log.md artifacts that agents read before doing real work, "metadata as code" inside data teams. Each instance looked alike — Markdown, frontmatter, cross-links — and none were built to cooperate. There was no agreed answer to what fields a document should carry or what a given filename meant, so each wiki stayed siloed in the team that made it.

Google's argument is that the fix is not another knowledge service but a format: a way to represent knowledge that anyone can produce without an SDK, anyone can consume without an integration, that survives moving between systems and organizations, lives in version control beside the code it describes, and is readable by humans and parseable by agents as the same file with no translation layer. The rest of this guide is what that format turned out to be.

Check your understanding

List four places your team's institutional knowledge currently lives — be specific (a particular wiki, a catalog, a folder, a person). For each, ask: if you handed a brand-new agent only that source, could it read it without a custom integration? The count of "no" answers is the size of the assembly problem OKF is trying to remove.

Chapter Two

What the spec actually requires

Reading The Open Knowledge Format specification, v0.1 Draft — terminology, bundle structure, concept documents, conformance.

2.1Bundle, concept, concept ID

OKF defines three terms and most of the format follows from them. A bundle is a self-contained directory tree of Markdown files — the unit of distribution, shippable as a git repository (recommended, because it carries history and diffs), a tarball, or a subdirectory inside a larger repo. A concept is a single unit of knowledge, represented as exactly one Markdown file; it can describe something concrete like a table or an API, or something abstract like a metric or a business process. A concept's ID is just its file path with the .md removed, so tables/orders.md is the concept tables/orders. The path is the identity — there is no separate registry of names.

Each concept file has two parts: a YAML frontmatter block delimited by ---, and a free-form Markdown body after it. The spec asks producers to favor structural Markdown in the body — headings, lists, tables, fenced code — over loose prose, because structure helps both human readers and agent retrieval. Three body headings carry conventional meaning when they apply: # Schema, # Examples, and # Citations.

Figure 2. Anatomy of a bundle. Each concept is one Markdown file — a YAML frontmatter block (the only required field is type) over a free-form body. Ordinary Markdown links between files turn the directory into a graph of relationships.

2.2The one hard rule

For all its apparent structure, OKF requires exactly one thing of a concept file: a non-empty type field in the frontmatter. Everything else is optional. type is a short, self-describing string — BigQuery Table, Metric, Playbook, Reference — that consumers use to route, filter, and present. Crucially, type values are not registered anywhere centrally, and consumers are required to tolerate unknown types gracefully, typically by treating them as generic concepts. The spec also names five recommended fields, in priority order: title, description, resource (a canonical URI for the underlying asset, absent for purely abstract concepts), tags, and timestamp (ISO 8601). Producers may add any other keys they like; consumers should preserve unknown keys and must not reject a document for having them.

Here is a complete, conformant concept file — a table bound to a real resource:

---
type: BigQuery Table
title: Orders
description: One row per completed customer order, refunds excluded.
resource: bigquery://acme/sales/orders
tags: [sales, orders, revenue]
timestamp: 2026-06-15T14:30:00Z
---

# Schema

| Column      | Type    | Description                                     |
|-------------|---------|-------------------------------------------------|
| order_id    | STRING  | Globally unique order identifier.               |
| customer_id | STRING  | Foreign key into [customers](/tables/customers.md). |
| total_usd   | NUMERIC | Order total in US dollars.                      |

# Joins

Joined with [customers](/tables/customers.md) on customer_id.

# Citations

[1] Internal schema reference for the sales warehouse.

A concept file: tables/orders.md. The only line the format strictly requires is the type.

2.3Reserved files, cross-links, and a deliberately permissive consumer

Two filenames are reserved at any level of the tree and must not be used for concepts. index.md is a directory listing for progressive disclosure — it lets a human or agent see what a directory holds before opening anything. log.md is a chronological, newest-first history of changes. Index files carry no frontmatter, with one narrow exception: the bundle-root index.md may hold a single frontmatter block solely to declare the OKF version it targets.

Concepts link to each other with standard Markdown links. The spec recommends the absolute, bundle-relative form — a path beginning with /, like [customers](/tables/customers.md) — because it stays valid when a file moves within its subdirectory; ordinary relative links are also allowed. A link asserts a relationship, but the kind of relationship lives in the surrounding prose, not in the link itself; a consumer building a graph view treats every link as a directed edge of an untyped relationship. And a link whose target doesn't exist is not an error — consumers must tolerate broken links, since a link may simply point at knowledge not yet written.

That permissiveness is the design, made explicit in the conformance rules. A bundle conforms if every non-reserved .md file has parseable YAML frontmatter with a non-empty type, and the reserved files follow their structure when present. Beyond that, a consumer must not reject a bundle for missing optional fields, unknown types, unknown extra keys, broken cross-links, or a missing index.md. The format is built to stay useful as bundles grow, get refactored, and are partly generated by agents.

Figure 3. The conformance model, drawn as three zones. One field is required, five are recommended, and everything else is tolerated — a consumer must not reject a bundle for an unknown type, an extra key, a broken link, or a missing index.

Practice

Open the v0.1 spec and read only section 9, conformance. In one sentence, state the smallest possible change that would make a valid Markdown file non-conformant. If your sentence is about anything other than the frontmatter or the type field, read it again — the surface area of "broken" is far smaller than most formats, and that is the point.

Chapter Three

Build a bundle

Reading The specification, the examples and appendix; McVeety & Hormati, Introducing the Open Knowledge Format, the reference implementations.

3.1Lay out the directory

We'll build a small bundle for a data team — a couple of warehouse tables and an on-call runbook — which is the shape the spec itself uses. The concept ID is the path, so the layout is the namespace:

sales/index.md
sales/log.md
sales/tables/index.md
sales/tables/orders.md
sales/tables/customers.md
sales/playbooks/freshness-alert.md

The bundle as a flat list of concept IDs — each path minus its .md is a concept.

sales/tables/orders.md is the concept file from Chapter 2 — paste it in. Its sibling customers.md follows the same shape. Now add a concept that isn't bound to any physical resource at all, to show that abstract knowledge is a first-class citizen: a playbook. It carries no resource field, and its body is steps rather than a schema.

---
type: Playbook
title: Incident response — data freshness alert
description: Triage steps when the orders pipeline lags its SLA.
tags: [oncall, incident]
timestamp: 2026-06-12T09:00:00Z
---

# Trigger

A freshness alert fires when the [orders](/tables/orders.md) table lags more
than 30 minutes behind its expected SLA.

# Steps

1. Check the ingestion dashboard for a stalled load.
2. Re-run the loader and confirm the lag clears.
3. If it recurs within an hour, page the data on-call.

An abstract concept: sales/playbooks/freshness-alert.md. No resource — just curated procedure.

3.2Add the navigational files

The root index.md is the one place a listing file may declare the OKF version, in a single frontmatter block. Below it, the body groups the bundle's contents under headings, each entry a link with a short description — the progressive-disclosure surface an agent reads first.

---
okf_version: "0.1"
---

# Sales knowledge

* [Tables](/tables/) - warehouse tables for the sales domain
* [Playbooks](/playbooks/) - on-call runbooks for the pipeline

sales/index.md — the only index that carries frontmatter, and only to declare the version.

The log.md records what changed and when, newest first, with ISO dates. The bold leading word is a convention, not a requirement, and the whole file stays greppable with ordinary tools.

# Update log

## 2026-06-15
* **Creation**: Added the [orders](/tables/orders.md) and [customers](/tables/customers.md) tables.
* **Update**: Linked orders to customers on the join key.

## 2026-06-12
* **Initialization**: Created the bundle and the root index.

sales/log.md — an append-only history, newest first.

3.3Validate it — against the spec, not a vendor

Validation here is unusually plain, because the conformance bar is so low. There are exactly three things to check: every non-reserved .md file parses as YAML frontmatter plus a body; every one of those frontmatter blocks has a non-empty type; and the reserved files, where present, follow their shape. You can confirm the first two by eye — cat each file and look at the top — or with any YAML parser in a few lines of script. There is no schema server to call and no SDK to install; the spec is the validator, and early community parsers and validators that appeared within days of launch simply automate those same three checks.

Google ships reference implementations at both ends to make the format concrete — an enrichment agent that walks a database and drafts a concept document for every table, and a single-file static HTML visualizer that renders any bundle as an interactive graph with no backend and no data leaving the page — plus three ready-to-browse sample bundles built from public datasets. They are deliberately proofs of concept: the agent shows one way to produce OKF and the visualizer one way to consume it, and nothing about the format requires either.

Lab · about thirty minutes

Build the bundle above for something real in your own work — three or four concept files for a system you know, at least one bound to a resource and one abstract (a metric, a runbook). Cross-link them with absolute bundle-relative paths. Then do the validation by hand: cat each file and confirm a non-empty type at the top. Finally, open it in any Markdown editor and follow a link. If the link resolves and the files read cleanly as plain text, you have a conformant OKF bundle — and you built it without touching anyone's platform.

Chapter Four

Own versus rent

Reading McVeety & Hormati, Introducing the Open Knowledge Format, the "format, not a service" and three-principles sections; Karpathy, LLM Wiki, on the wiki as a git repo.

4.1The filesystem is the API

The whole argument of OKF compresses into a property the spec states plainly: the knowledge is just files. There is no account to create, no SDK to import, no registry to query. Reading a bundle is reading text; shipping one is moving a folder. Anything you can already do with cat lets you read it, and anything you can do with git clone lets you deploy it. That sounds trivial until you notice it is the entire interface — the filesystem is the API, and a filesystem is something every tool, language, and agent already speaks.

From that one property the format's four design goals follow: readable by humans without tooling, parseable by agents without bespoke SDKs, diffable in version control, and portable across tools, organizations, and time. Google is blunt about the logic — a format earns its worth through how widely it is adopted, not through who controls it — which is why they publish it openly, as a format rather than a platform, by design rather than as an afterthought.

4.2What renting costs

Set that against the default. When your curated knowledge lives inside a metadata catalog or a knowledge product, getting it out, or letting a new agent read it, means going through that product's API on that vendor's terms. The knowledge is real and yours, but its mobility is rented: every consumer needs that vendor's integration, every migration is an export project, and the day you change tools you discover how much of the value was in the surrounding platform rather than the knowledge itself. Notion's strength is a proprietary collaborative platform; that same proprietariness is the wall. Obsidian keeps local Markdown files, which is most of the way there, but offers no agreed cross-bundle standard for one team's vault to be read by another team's agent without custom glue.

OKF's move is to make the knowledge layer outlive any particular tool by refusing to be more than files. You can still edit a bundle in Obsidian, render it as a static site, or pipe it through an agent — those are consumers at one end of a format whose contract sits in the files, not in the tooling. The tool becomes swappable; the knowledge stays put.

Figure 4. Own versus rent. Rented, your knowledge lives inside a vendor's catalog and every consumer needs that vendor's integration. Owned, it lives in files any tool reads directly — the filesystem is the API, and the bundle travels as a tarball, a repo, or a subdirectory.

This is the same instinct as keeping prompts, context, and configuration in version control rather than in a vendor console: own the layer your work depends on, in files you can read and diff, so the tool stays replaceable and the knowledge compounds where you can see it.

Practice

Pick one tool where your team's knowledge currently lives. Write down what it would take to move that knowledge to a different tool tomorrow — the export path, the format on the other side, what breaks. Then ask what that same move costs if the knowledge is an OKF bundle in a git repo. The gap between those two answers is what "rent" is costing you.

Chapter Five

Where OKF sits

Reading McVeety & Hormati, Introducing the Open Knowledge Format, the related-patterns discussion; the specification, section 10; and the launch discourse for the MCP and llms.txt comparisons.

5.1A knowledge layer, not a connection or a search layer

OKF is easy to mistake for several neighbors it doesn't replace. The clarifying question is which layer of an agent's stack a thing governs. The Model Context Protocol governs the connection layer — how an agent reaches live tools and data sources at run time. OKF governs the knowledge layer — what an agent knows about those sources, curated and compiled ahead of time and stable between runs. They are designed to coexist; a server speaking MCP can perfectly well expose an OKF bundle as one of the sources it serves. Retrieval-augmented generation sits on yet another axis: RAG searches raw, unstructured text on the fly at query time, while an OKF bundle is a deliberately maintained, pre-compiled layer whose links already form a graph. OKF is best read as complementary to RAG rather than a replacement — one is curation, the other is search.

Figure 5. Where OKF sits. It is the curated knowledge layer — pre-compiled and stable — distinct from the connection layer (MCP) that reaches live tools and the retrieval layer (RAG) that searches raw text at query time. llms.txt and the AGENTS.md family occupy adjacent scopes. These compose; they do not compete.

5.2The look-alikes that do a different job

Two conventions look most like OKF because they too are Markdown with a little structure, and Google's own announcement names them as instances of the same reappearing pattern. The AGENTS.md and CLAUDE.md family — the same files Karpathy uses as the schema layer of his wiki — are primarily behavioral instructions: how an agent should act inside a particular repository. OKF is about curated knowledge content — the definitions, metrics, runbooks, and join paths — packaged as a portable bundle. The surface rhymes; the jobs differ. llms.txt, similarly, is oriented to a website telling crawlers and models what to read at the site level, an analog of robots.txt, where OKF targets organizational knowledge bundles that travel between teams and tools. None of these are rivals so much as occupants of adjacent scopes, and a serious stack may well run several at once.

Check your understanding

Take one AI tool you use and name which layer of Figure 5 it actually operates on — connection, knowledge, or retrieval. If it seems to span two, say which job is which. The exercise is the whole skill of this chapter: most "is X better than Y" arguments in this space dissolve once you notice X and Y are on different layers.

Chapter Six

The honest case against

Reading The specification, the link-semantics, non-goals, and versioning sections; and the public launch discourse on Hacker News and Karpathy's gist (June 2026), read as discourse, not citation.

6.1"Just Markdown in a repo," and yet another standard

The most common reaction to OKF is that it is Markdown with YAML frontmatter and not much else — which is accurate, and is partly the point, but is a fair worry too. The format arrives into a crowded field of agent-knowledge conventions — MCP, CLAUDE.md, AGENTS.md, and now this — and a recurring question in the launch discourse is whether one more spec achieves adoption or simply adds to the fragmentation it set out to cure. A format's value scales with how widely it is adopted, and on day one nothing is widely adopted. That is a real risk and not one the elegance of the design can answer; only adoption can.

6.2Untyped links and free-string types

The minimalism that makes OKF portable also makes its interoperability surface thin. Links are untyped: a link from one concept to another asserts that a relationship exists, but "contradicts," "supersedes," "depends on," and "joins with" all collapse into the same undifferentiated edge, with the distinction left to prose a machine can't reliably parse. Types are free strings with no shared vocabulary, so two teams may describe the same kind of thing under different names and a consumer cannot know they match. None of this breaks conformance — it is the design — but it means an OKF graph is a weaker object than a typed knowledge graph, and in the launch discourse some practitioners argued the format should grow toward typed, machine-computable relations in the RDF or property-graph tradition. Whether that richness arrives as an extension, or whether the format's deliberate plainness is precisely what lets it spread, is one of the genuine open questions. The same minimalism is both the strength and the limit.

6.3The maintenance the format doesn't guarantee

The pattern's appeal rests on the claim that an LLM will keep the wiki current. But the v0.1 spec defines a file format, not a maintenance protocol — it specifies no review gates, no verification steps, no conflict resolution. Karpathy's insight explains why an LLM is well-suited to the bookkeeping; it does not guarantee that an agent maintaining a large, evolving bundle over months stays consistent, avoids hallucinated updates, or resolves contradictions sensibly. Practitioners working in the pattern have flagged the hard parts directly: a clean git merge of two agents' edits is not necessarily a correct merge, since both may have added the same fact in different words; and as the count of generated notes grows, so does the chance that at least one is quietly wrong. These are problems of operating an OKF bundle, and the format leaves them entirely to the producer.

6.4What's missing, and what we can't say yet

By its own non-goals the spec is silent on the things cross-organization knowledge sharing most needs: permissions, access control, provenance, and verifiability. It deliberately does not define a typed taxonomy, a query language, or serving infrastructure, and does not try to replace domain schemas like Avro or Protobuf — it references them. Those are reasonable scope decisions for a v0.1, but they mean OKF handles the portable, curated, low-stakes cases and hands the governance-heavy ones back to you. And the largest caveat is simply time: the format is a week old, its governance past v0.1 is undefined, and there is no track record yet for whether its extreme minimalism holds up in real multi-team use or whether de-facto extensions fragment it. Everything in this guide is the best current reading of a format whose ground is still settling. The honest move is to build with it where it fits, own what you build, and keep your eyes open about what it does not yet promise.

Practice · put it together

Take the bundle you built in Chapter 3 and audit it against this chapter. Which failure mode is it most exposed to — an untyped link that should carry meaning, a free-string type no other team would guess, a body fact with no provenance, or drift if an agent maintained it unwatched? Name the one that worries you most and write the single convention you would add — in your schema file, not the spec — to guard against it. That convention is the line between a format and a discipline.

Files Over PlatformsJune 2026