# AEO in Pictures
The snippet, the cache, the response — and why every visitor gets the same HTML.
By AgentSite · 8 min read · Updated 2026-05-23
Read [the long-form essay](/aeo) first if you want the _why_. This page is the _what_. Diagrams and code samples grounded in the real snippet code at `packages/express/agentsite.cjs`.
The single architectural fact that holds everything else together: **the snippet does not branch on who is asking.** Every visitor — human browser, GPTBot, ClaudeBot, Perplexity, your team's `curl` — receives byte-identical HTML. The branch is on the _kind of response_ (HTML page vs. site-level agent asset vs. static file), not on the user agent. Bot identity is something we _log_, not something we route on.
* * *
## The model in one diagram
HTML page
SPA route, e.g. /pricing
Site-level agent asset
llms.txt, sitemap, robots.txt,
.well-known/ai-agent.json, etc.
Static file
JS, CSS, images, fonts
Incoming request
any UA, any visitor
AgentSite snippet
in your stack
What's being asked for?
Fetch RenderBundle
snippet cache → backend
Snippet-owned route
asset cache → backend
Pass-through
express.static / upstream
applyBundle:
title, meta, OG, JSON-LD,
schema graph, markdown body
Enriched HTML
Agent-readable doc
Origin bytes, untouched
Three response classes. One enrichment path (HTML pages), one fast cached-asset path (site-level agent docs), one pass-through path (everything else). The branch is on what's being served — not on who's asking.
* * *
## The four lines
```ts
// server.ts (Express; Next/Nuxt/Astro/etc. have the same one-import shape)
const express = require('express')
const path = require('path')
const agentsite = require('./agentsite')
const app = express()
const DIST = path.join(__dirname, 'dist')
app.use(express.static(DIST, { index: false }))
app.get('*', agentsite({
distDir: DIST, // ← read index.html from disk
site: 'https://yourdomain.com',
token: process.env.AGENTSITE_TOKEN,
}))
app.listen(3000)
```
That's the install. No browser to operate. No render queue to manage. No cache to run. No standards to track.
* * *
## What every visitor sees
Same URL. Same fetch. Same response bytes regardless of who's calling. The difference is what each consumer reads from those bytes.
### Raw shell — what your SPA ships by default
```bash
$ curl -H "X-Agentsite: none" https://yoursite.com/pricing
```
The `X-Agentsite: none` header tells the snippet to skip enrichment — that's the only way to see the un-enriched shell. (We use this flag internally so re-rendering an already-AgentSite-running site doesn't double-inject; the landing-page "before injection" demo uses it too.)
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>YourApp</title>
<link rel="stylesheet" href="/assets/index-d8f3a.css" />
<script type="module" src="/assets/index-9e21b.js"></script>
</head>
<body>
<div id="app"></div>
</body>
</html>
```
Word count: zero. Structured data: zero. Citation likelihood: zero. The crawler leaves.
### Enriched shell — what AgentSite serves to every visitor
Same `curl`, no header flip. The snippet fetches a RenderBundle from the backend (or pulls it from its in-memory cache), then patches `<head>`, injects JSON-LD, and splices the markdown body into a preboot `<div>` just before `</body>`. A `<style>` rule in `<head>` hides that div the moment JavaScript boots, so humans never see it; bots and extractors that strip `<noscript>` read it cleanly. (Per-site dashboard override switches to legacy `<noscript>` mode for tools that honor it — Claude Code WebFetch.)
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Pricing — YourApp</title>
<meta name="description" content="YourApp pricing: Free, Pro at $49/mo, Studio at $149/mo. Includes per-page render limits, FAQ schema injection, and dashboard access." />
<meta property="og:title" content="Pricing — YourApp" />
<meta property="og:description" content="..." />
<meta property="og:image" content="https://yoursite.com/og/pricing.png" />
<link rel="canonical" href="https://yoursite.com/pricing" />
<script type="application/ld+json" data-agentsite="bundle">
{ "source": "agentsite", "engine": "agentsite-renderer", "cacheStatus": "hit", "url": "/pricing", "renderedAt": "...", "expiresAt": "..." }
</script>
<script type="application/ld+json" data-agentsite="schema" data-type="FAQPage">
{ "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ ... ] }
</script>
<link rel="stylesheet" href="/assets/index-d8f3a.css" />
<script type="module" src="/assets/index-9e21b.js"></script>
</head>
<body>
<div id="app"></div>
<div id="agentsite-preboot" aria-hidden="true">
<pre data-content-type="text/markdown" data-source="agentsite">
# YourApp Pricing
> Free for hobby projects. Pro for active sites. Studio for content-heavy or UGC products. See /pricing for current rates.
## What's included in the Free tier?
Free includes a one-time kickoff credit, 7-day cache, and pay-as-you-go top-up.
## How does Pro pricing work?
Pro tops up your weekly plan units; see /pricing for current allotments and rates.
...
</pre>
</div>
</body>
</html>
```
Same bytes whether a human browser or GPTBot just fetched it. The next diagram shows why that works.
* * *
## Same response, two readers
The enriched HTML above
byte-identical for every visitor
Human browser
JS enabled
Bot / crawler / agent
no JS execution
Boots SPA from div#app,
preboot div hidden by inline CSS
Reads head meta,
JSON-LD blocks,
preboot markdown body
The designed UI
Title, description, OG,
schema graph, full page markdown
This is the trust-load-bearing diagram. There is no cloaking. There is no per-bot tailoring. There is one response. The parts of HTML the browser hides — the preboot `<div>` hidden by an inline `<style>` rule once `data-agentsite-js="1"` is set on `<html>`, and the `<script type="application/ld+json">` blocks the renderer doesn't evaluate as page markup — are exactly the parts agents are wired to read. The HTML standard works in our favor.
* * *
## The markdown mirror
Append `.md` to any page path and the snippet returns the body in raw `text/markdown`:
```bash
$ curl https://yoursite.com/pricing.md
```
```markdown
# YourApp Pricing
> Free for hobby projects up to 50 pages. Pro at $49/month for active sites. Studio at $149/month for content-heavy or UGC products.
## What's included in the Free tier?
Free includes 50 pages, 1,000 renders per month, and 7-day cache.
## How does Pro pricing work?
Pro is $49 per month. You get 500 pages per site, 25,000 renders per month, daily cache freshness, and full dashboard access.
…
```
Same content as the preboot body, denser format, no HTML noise. Language models prefer reading markdown — lower token cost, higher comprehension. `Accept: text/markdown` also works on every route.
* * *
## Site-level agent assets
The snippet also owns a small set of canonical routes. They aren't rendered per request — they're served from the snippet's own asset cache, refreshed against the backend every 1–24 hours depending on volatility. Customer-shipped files in `dist/` always win (express.static runs first), so a hand-rolled `dist/llms.txt` overrides ours automatically.
Layer 4 — Agentic frontier
/.well-known/ai-agent.json
/.well-known/agents.json
/.well-known/agent-skills/index.json
/.well-known/api-catalog
/.well-known/mcp/server-card.json
/ai.txt
Layer 3 — Structured data
Article JSON-LD
FAQPage JSON-LD
BreadcrumbList
Organization sameAs
Layer 2 — Content map
/llms.txt
/llms-full.txt
/path.md mirrors
Layer 1 — Crawler access
robots.txt — granular AI policy
sitemap.xml
Layer 3 (JSON-LD) lands inside the enriched HTML response — it's per-page schema generated from each render. Layers 1, 2, and 4 are stand-alone files served from snippet-owned routes. One install lights up all four.
* * *
## The cache, in the middle
Why most requests never reach AgentSite at all:
hit — the common case
miss
hit
miss
Request
Snippet
In-memory
bundle cache
5 min TTL, per route
applyBundle → respond
zero backend cost
POST /render to backend
Backend
RenderBundle row?
Return cached bundle
Playwright render
\+ artifact generation
Persist bundle row
\+ return
Three tiers. The hot path is layer 1: the snippet keeps a 5-minute in-memory bundle per route inside your own process. Once a route is warm, applying the enrichment is single-digit milliseconds and the backend sees nothing. Layer 2 is the backend's persistent RenderBundle row — durable across snippet restarts, shared across all your instances. Only a layer-2 miss triggers a fresh headless render.
**Implication:** for a typical SPA serving heavy bot traffic against a handful of routes, the backend sees a small fraction of the fetches. That's the unit-pricing model — you pay for fresh renders, not for serves. Cached delivery is free.
* * *
## What actually runs per request
The real call path for an HTML route, paraphrased from `packages/express/agentsite.cjs`:
```ts
// agentsiteHandler — the function express calls for every GET *
// 1. Snippet-owned routes branch first. These never reach the bundle path:
// /env.js, /robots.txt, /sitemap.xml, /llms.txt, /llms-full.txt,
// /.well-known/{api-catalog, openid-configuration, ai-agent.json,
// agents.json, agent-skills/index.json, mcp/server-card.json}, /ai.txt
// Each served from its own asset cache (1h–24h TTL).
if (req.path === '/llms.txt') return serveLlmsIndex(res)
// ... (other site-level asset branches)
// 2. Markdown mirror — '.md' suffix OR Accept: text/markdown.
// Strips the suffix, fetches the bundle, returns bundle.markdown verbatim.
if (wantsMarkdown(req)) return serveMarkdown(req, res)
// 3. Static file with an extension (.css, .js, .png, ...) — 404 here
// because express.static would have served it before we ran.
if (req.path.indexOf('.') !== -1) return res.status(404).send('Not found')
// 4. HTML route — fetch the bundle (snippet cache → backend), then enrich.
const bundle = await fetchBundle(req.path) // hits 5-min in-memory cache
if (!bundle) return res.send(indexHtml) // backend down: ship raw shell
if (bundle.cacheStatus === 'miss-generating') { // first-ever fetch for this route
res.set('Retry-After', '5')
return res.send(injectGeneratingSignifier(indexHtml))
}
const finalBody = applyBundle(indexHtml, bundle) // patch meta + JSON-LD + preboot div
res.send(finalBody)
// 5. Telemetry — fire-and-forget, batched, 60s flush.
// Classifies the UA into a coarse bucket (gptbot / claude-user /
// perplexity / human / unknown-bot ...) and pushes to a ring buffer.
// Humans are dropped. The response above does not vary by classification.
recordTelemetry(req, { cacheHit, renderMs, responseBytes })
```
Five steps for a cache hit. Six for a cache miss. The renderer and bundle generators run in our cloud — your stack pays no CPU for them.
* * *
## Telemetry — observed, not switching
The snippet classifies incoming user agents. It uses the classification in exactly one place: counting them.
classifyUserAgent
human dropped
60s flush
Request
UA = GPTBot, Claude-User,
Chrome, etc.
Snippet
Identical enriched response
regardless of UA
Coarse bucket:
gptbot, claude-user,
perplexity, …, human
Ring buffer
500-record cap
POST /telemetry/ingest
Aggregated counts
by crawler class + route
Dashboard: 'Perplexity fetched
12 pages this week, ChatGPT 3'
Three rules:
1. **The response does not vary by bot.** Every visitor receives the same enriched HTML. Telemetry observes; it doesn't switch.
2. **No request body, no headers beyond a UA bucket and route path.** We don't capture query strings on URLs (PII risk), don't log auth headers, never see cookies. Human-classified requests are dropped before they hit the ring buffer.
3. **Aggregated, not surveilled.** The dashboard shows _Perplexity fetched 12 pages this week, ChatGPT fetched 3_ — counts and patterns, not individual request traces.
The result is a useful per-site read on which AI engines are actually visiting, which pages they prefer, and how often they return. No competitor has this data — we're the only party in the request path. And we have it without ever tailoring a response.
* * *
## Privacy — what AgentSite's renderer can and cannot see
A common technical-buyer question: _your renderer is hitting my site — what's the exposure?_
AgentSite renderer
cannot reach
Private surface — invisible to AgentSite
Authenticated dashboards
User profile data
Customer records / DB
Admin panel
API keys / secrets
Cookies / sessions
Public surface — what AgentSite reads
Marketing pages
Pricing
Blog posts
Docs
Pre-rendered SPA routes
Headless browser
No credentials
No session
No cookies
Concretely:
- AgentSite's renderer fetches over the **public, unauthenticated front door** — the same surface GPTBot, Googlebot, or a fresh incognito tab can reach.
- The renderer does **not** hold credentials, API keys, OAuth tokens, or session cookies. It can't log in, can't impersonate a user, can't reach `/admin`, `/dashboard`, or `/api/private/*` if those routes require authentication.
- If your routes are properly authenticated, AgentSite cannot see them. **The boundary is your auth check, not our promise.** The same boundary that protects you from any random crawler protects you from us.
In other words: AgentSite can see exactly what GPTBot can see. And nothing more.
* * *
## Side-by-side — homegrown vs. AgentSite
| | Homegrown AEO | AgentSite |
| --- | --- | --- |
| **Initial setup time** | 3–5 days, technical engineer | Four lines, fifteen minutes |
| **Quarterly maintenance** | 1 day to track standards changes | Zero — snippet tracks the platform |
| **Headless browser to operate** | Yes, in your stack | No, runs on AgentSite Cloud |
| **Files to maintain** | 6+ in parallel (HTML, .md, llms.txt, JSON-LD, OG, agent-card) | Zero — all derived from your live site |
| **Schema-vs-content drift** | Inevitable, manual to detect | Structurally impossible — same render pass |
| **Per-bot tailoring / cloaking risk** | Tempting; flagged by Google | None — same bytes for every visitor |
| **Anti-pattern detection** | Manual, ad-hoc | Validated at publish, blocked if bad |
| **Per-page summaries** | Hand-written | Auto-generated from page content |
| **Agent-fetch telemetry** | None | First-party, every request (observation only) |
| **Lock-in to undo** | Custom code in your repo | Remove four lines, done |
| **Cost of doing it wrong** | Silent — your AEO score is zero | Caught at publish, surfaced in dashboard |
* * *
## The one-screen pitch
```
┌─────────────────────────────────┐
│ │
Any visitor ──────► │ AgentSite install │ ───► Enriched HTML
(browser, GPTBot, │ (Express / Sidecar / Nginx │ (same bytes for everyone;
ClaudeBot, │ today; Edge + SSR SDK in │ browser CSS-hides the
Perplexity, curl, │ pre-release — pick one) │ preboot div, agents
…all the same) │ • Patches head + JSON-LD │ read it directly)
│ • Splices preboot markdown │
│ • Serves llms.txt, sitemap, │ ───► /pricing.md, /llms.txt,
│ .well-known/* from cache │ /.well-known/ai-agent.json,
│ • Logs who showed up │ /sitemap.xml, /robots.txt, ai.txt
│ (telemetry only — never │
│ switches the response) │
│ │
└─────────────────────────────────┘
```
One install. Every visitor served the same truthful version. Every artifact generated from one source of truth. Every standard tracked. No cloaking — same bytes for everyone, the HTML standard works in our favor.
That's it. That's the product.
* * *
## Read next
- **[The complete picture](/aeo)** — the long-form essay this page summarizes visually.
- **[Agent readability](/agent-readability)** — the thesis: why the diagrams above describe a foundation, not a feature set.
- **[The Five Layers of AEO](/five-layer-aeo)** — the same five-layer model in textual form.
- **[Direct answer](/direct-answer)** — the 40-60 word paragraph that anchors every Layer-4 win.
- **Score your site** — paste a URL, get the five-layer breakdown live.
- **Dashboard** — once installed, see which AI engines are visiting, which pages they prefer, and how often they return.