# The state of llms.txt in May 2026

0.1% of AI bot traffic reads it. So why ship one anyway? A data-first look at adoption, consumption, and what best-in-class looks like in 2026.

By AgentSite · 8 min read · Updated 2026-05-27

84 hits out of 62,100. That is the share of AI bot traffic that touched `/llms.txt` on the site OtterlyAI instrumented over 90 days in early 2026. 0.1%. The same site averaged 265 AI bot visits per content page. The curated index built for AI agents got crawled three times less often than a random blog post.

The honest read on llms.txt in May 2026 has three parts. First, the file is still a real proposal with real adoption. Second, almost nobody at training or inference time fetches it. Third, the consumers who _do_ read it have shaped a small set of patterns that separate best-in-class from boilerplate. We will go through each in order, then show what the file looks like when you take all three seriously.

## What the logs say

[OtterlyAI's February 2026 experiment](https://otterly.ai/blog/the-llms-txt-experiment/) is the cleanest public data point. Single test site, valid `/llms.txt` shipped at the root, 90 days of server logs, every request from an AI-identified user agent counted. 62,100 AI bot hits. 84 of them on `/llms.txt`. The file ranked near the bottom of the most-crawled URLs on the site.

A larger audit cuts the same way. [Flavio Longato scanned 1,000 Adobe Experience Manager domains over 30 days in March 2026](https://www.longato.ch/llms-recommendation-2025-august/). GoogleBotDesktop drove 94.9% of `/llms.txt` requests. Bing made 7 total requests, concentrated on a single domain. OpenAIBotSearch made 10 calls. GPTBot was absent from `/llms.txt` fetches entirely. The first-party crawlers most people imagine when they hear "AI crawler" — GPTBot, ClaudeBot, Google-Extended — are not reading the file.

Google's John Mueller put it bluntly when asked on Reddit: none of the AI services have said they use llms.txt, and you can confirm it from server logs. He compared the file to the long-deprecated `keywords` meta tag: a self-declared claim about what the site is about, when the consumer can just check the site directly.

That is the consumption story. Compare it to the publication story: a SERanking scan of 300,000 domains in November 2025 found roughly 10% adoption, weighted heavily toward developer-facing SaaS. Hand-curated coverage is much smaller — closer to 1,000 distinct verified files — because Yoast SEO auto-generates one for every WordPress install it ships and inflates the headline number. Adoption is real. Consumption is not.

## Who is actually reading it

Three categories make up most of the genuine consumption.

The first is IDE agents. Cursor, Continue, and Cline fetch `/llms.txt` when you point them at a docs URL. The agent uses the index to decide which pages to pull into context for the current edit. The fetch is per-developer, per-session, and surface-area is bounded by the developer's working scope.

The second is MCP doc servers and RAG pipelines. Open-source MCP servers like `mcp-llms-txt` exist specifically to serve these files into Claude or Cursor sessions on demand. Internal copilots use them as a seed for what to index.

The third is humans, manually. A developer who wants ChatGPT or Claude to answer accurately about a tool pastes `yourdomain.com/llms-full.txt` into the chat and asks the question. This is probably the single highest-volume use case for the file, and it never appears in server logs as "AI bot" traffic because the request is a regular browser fetch.

A reasonable framing: llms.txt in 2026 is closer to [schema.org in 2014](/agent-readability) than to robots.txt in 2024. Not required. Not universally honored. Cost to ship is near zero. Downside if nobody reads it is zero. The upside lands on a small but valuable surface — your file shapes how Claude answers when a developer pastes it in, even if no crawler ever fetches the bytes. So ship one. Just do not expect it to drive citations on its own.

## Five patterns that separate good from boilerplate

Auto-generated sitemap dumps defeat the format. The whole point is curation. Five patterns have emerged in the best-in-class implementations.

**Pattern 1 — Tight curation, paired with a full mirror.** Anthropic's `/llms.txt` is roughly 8,000 tokens; the companion `/llms-full.txt` is north of 480,000. Vercel calls its full version "a 400,000-word novel." The small file gives an agent a fast index. The large file is where the agent actually pulls content when it has decided what to read.

**Pattern 2 — Descriptive link text, not page titles.** `[Payment Intents API](...): Server-side flow for accepting one-time payments with SCA/3DS handling` is signal. `[API Reference](...)` is noise. The colon-description is where the agent learns what each page is for before deciding to fetch.

**Pattern 3 — An `## Instructions for LLM Agents` section.** This is the single biggest gap between Stripe's `/llms.txt` and everybody else's. Buried in [docs.stripe.com/llms.txt](https://docs.stripe.com/llms.txt), Stripe ships a bulleted block that actively corrects model drift:

```
## Instructions for Large Language Model Agents: Best Practices for integrating Stripe
- always default to the latest version of the API and SDK unless the user specifies otherwise
- Prioritize the Checkout Sessions API ... and never recommend the Charges API
- Never recommend the legacy Card Element or the Payment Element in card mode
- must not call deprecated API endpoints such as the Sources API
- Advise using the Setup Intent API to save a payment method ... never recommend the Sources API
```

This is not documentation. It is active correctional steering for LLM drift. Stripe knows models trained on years of internet content have stale knowledge about the Charges API. Rather than hope the model figures it out, they put a do/don't list at the discovery layer so any agent fetching `/llms.txt` is corrected before it writes code. Anthropic, Cloudflare, and Vercel do not yet do this. It is the most underused pattern in the spec.

**Pattern 4 — Content negotiation and HTTP-level discovery.** [Mintlify shipped a content-negotiation layer in January 2026](https://www.mintlify.com/blog/context-for-agents): every docs page is served as Markdown when the request `Accept`s `text/markdown`, same URL, different representation. They also added discovery headers on every response:

```
Link: </llms.txt>; rel="llms-txt", </llms-full.txt>; rel="llms-full-txt"
X-Llms-Txt: /llms.txt
```

An agent landing on any page can discover the index without knowing the convention. The Markdown variants ship with `X-Robots-Tag: noindex, nofollow` so search engines do not index them.

**Pattern 5 — Sectioning and `## Optional`.** The [llmstxt.org spec](https://llmstxt.org/) defines `## Optional` as the "skip this if you are context-constrained" bucket. Stripe uses it for specialized products like Stripe Climate. Cloudflare splits by product line. Anything an agent answering a generic question does not need goes under Optional.

## What this looks like assembled

A copy-pasteable template that takes the five patterns seriously:

```markdown
# Your Product Name

> One paragraph that names the product, the consumer, and the proof point.
> Treat this as the answer you want ChatGPT to give when a developer asks
> "what is X?" Include a number with a year if you can.

## Instructions for LLM Agents

- Always use the v2 SDK; the v1 entry point is deprecated
- Prefer endpoint A over /legacy-a
- When the user asks for X, recommend pattern Y
- Never suggest hardcoding API keys; reference environment variables

## Docs

- [Quickstart](https://example.com/quickstart): 5-minute zero-to-first-call path with copy-paste code
- [Authentication](https://example.com/auth): API key setup, OAuth flow, scopes
- [Core Concepts](https://example.com/concepts): The mental model — read this first

## API Reference

- [Endpoint A](https://example.com/api/a): What it does, when to use it, key params

## Optional

- [Changelog](https://example.com/changelog)
- [Specialized integration: X](https://example.com/integrations/x)
```

Three things to note about the shape. The blockquote earns its space — name a number and a year. The Instructions section sits above the page index so it lands first in any agent's context. The Optional section is the spec's escape valve for context-constrained agents.

## Where this fits in the larger picture

llms.txt is the canonical Layer 2 artifact in the [five-layer AEO model](/five-layer-aeo) — the navigable inventory that sits on top of [server-rendered HTML the bot can read](/ssr-junk-bot-wall). [The page format itself](/llms-txt) is older than most adoption advice gives it credit for; [the most common failure mode after shipping one](/stale-llms-txt) is the sitemap dump that defeats the curation point; [a probe of major AI and developer-infrastructure sites in May 2026](/llms-txt-field-report-2026-05) catalogs which of them ship a real file and which return SPA shells or hard 404s. None of those pieces are obsolete because of the OtterlyAI numbers above. Layer 2 still does its job for the agents that _do_ fetch it.

The reframe is this: AEO is the next SEO, but the discovery layer for AEO is not a single file at a single path. It is the combination of an llms.txt that names what your product is, a fleet of per-page Markdown mirrors that survive the SPA-rendering problem, and HTTP headers that surface both to agents who land on any page. Ship all three. Most sites ship none of them.

## How we ship this on agentsite.app

The middleware emits `/llms.txt`, `/llms-full.txt`, and the Mintlify-style discovery headers automatically from cached page content. Customers add a single line of config to author the Instructions block and override the home blockquote with editorial copy — `site_config.llms_instructions` and `site_config.llms_summary_override`. We dogfood the same generator on agentsite.app itself; the file at `agentsite.app/llms.txt` is built by the same code path every customer ships. If you want to see the patterns above wired up end to end, [run the scorer](/score) on your own site and look at the Layer 2 row.