# Answer Engine Optimization AEO is the new SEO. Here's the complete picture — and why most sites are invisible to it. By AgentSite · 22 min read · Updated 2026-05-23 Web Claude can't read your website. Neither can ChatGPT. Neither can Perplexity. Neither can the AI Overview that Google now puts above its own search results. Vercel measured it. **569 million GPTBot requests** across their network in 2025. **Zero of them executed JavaScript.** If your site is a single-page app — React, Vue, Svelte, anything that boots from `<div id="app"></div>` — every one of those crawlers saw an empty shell where your content was supposed to be. (Vercel, _"The Rise of the AI Crawler,"_ 2025.) That's the floor. There are three more layers above it. Together they decide whether AI engines can read you, whether they can map you, whether they can understand you, and whether they're ready to act on your behalf when the agent web arrives. Most sites are failing on all four. This essay is the complete picture. Eight chapters mapped to the operational stack of the field — what AI engines actually look at, what the peer-reviewed research has measured, and what most sites get wrong. Every claim sourced. Every number from a study you can look up. The optimization target has moved. AEO is the new SEO. Easy, good, cheap — compared to a Next.js rewrite that takes four months. * * * ## Foundations: there is no position 3 SEO was a ranking problem. Ten blue links, ordered. You optimized to be #1 because the click went to the top. AEO is a citation problem. The user asks. ChatGPT writes back a paragraph. Inside that paragraph are one or two source links. Either yours is one of them, or it isn't. Ranking is competitive. **Citation is binary.** AEO world User question Answer engine Generated answer Citation 1 Citation 2 User reads — or doesn't SEO world User search Search results page Rank 1 Rank 2 Rank 3 Rank 10 User clicks The implications cascade. A page that won SEO with keyword density and a thousand backlinks may be entirely unquotable in the answer-engine game. A page with no backlinks at all may be quoted constantly because its content is structured as direct, citable answers. **The fragmentation is real.** Across 230,000+ prompts, only **2.37% of cited URLs appear in all three** of ChatGPT, Perplexity, and Google AI Overviews simultaneously. 91% appear in just one engine. (Kevin Indig, _"The Consensus Gap,"_ Growth Memo.) There is no single "AI visibility" number. What wins on ChatGPT — Wikipedia, encyclopedic authority — loses on Perplexity, where Reddit is 46.7% of the citation pool. **The off-page lever has inverted.** Across 75,000 brands, Ahrefs measured the correlation of unlinked brand mentions with AI Overview presence at **0.664**. Backlinks: **0.218**. Brand mentions outperform backlinks by a factor of three. The decade of link-building you just paid for is dead leverage. AI Overviews now appear in **89% of brand search results**, dropping clicks to the top organic result by 34.5%. Google still owns ~89% of web traffic (83.8 billion monthly visits vs. ChatGPT's 5.8 billion), so SEO doesn't go away. But AEO is sitting on top of it. Gartner's read: 25% of search will shift to AI-mediated answers by end of 2026. The field is so new it can't name itself: a 2025 survey of 200+ senior SEOs split between "AI search optimization" (36%), "SEO" (27%), and "GEO" (18%). We call it AEO. Substantively, the labels point at the same operational work. **Common questions:** [What is AEO and how is it different from SEO?](/faq#what-is-aeo) · [What is the Gartner 25% forecast?](/faq#gartner-forecast) · [Do backlinks help with AEO?](/faq#backlinks-help) * * * ## Technical: if the bot can't reach you, nothing else matters Vercel's number is the only one that matters for SPAs. 569 million GPTBot requests. 0% executed JavaScript. GPTBot fetched JS files in 11.5% of its requests but never ran them. ClaudeBot: 23.84% fetched, also never executed. Definitive. If your page renders client-side, the bots see an empty `<div>`. They don't crash, they don't error, they don't tell you. They just don't cite you. ### The five-layer AEO model When an AI engine sets out to use your site as a source, it operates across five layers — four on your site (you control them) and one off your site (you influence it). Each one fails differently. Most sites have a problem in at least three of them. Layer 1 — Crawler access Pre-rendered HTML robots.txt allows GPTBot sitemap.xml Real 404s, no soft-200s Layer 2 — AI-readable content map /llms.txt /llms-full.txt Per-page .md /.well-known/agent-card.json · mcp.json · ai-agent.json Layer 3 — Structured data Article JSON-LD FAQPage JSON-LD Organization sameAs BreadcrumbList Layer 4 — Content quality (on-page) Answer-first opening Definition density Named statistics + citations Freshness signals Layer 5 — External measurement (off-site) Mention rate · share-of-voice Citation rate across engines Wikipedia · Reddit · YouTube · press The layers stack. Layers 1-3 are pure technical lift; Layer 4 is page-level content work; Layer 5 is reputation downstream of everything below. Layer 1 is table stakes — without it, the upper layers don't matter, the crawler never reaches them. Layer 5 is what off-site monitoring tools (Profound, Scrunch, Otterly, Adobe LLM Optimizer) measure — and it's downstream of everything 1-4. You don't ship Layer 5; you earn it. **Layer 1 — Crawler access.** The diagnostic is brutally simple: `curl -A "GPTBot" https://yoursite.com/ | grep -c "<p>"`. If that number is zero, every other layer is unreachable. The crawler hit your URL, received `<div id="app"></div>`, and left. Layer 1 requires pre-rendered HTML on every route, a `robots.txt` that allows AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Applebot-Extended), a `sitemap.xml` with real URLs (no hash routes), real 404s for missing pages (SPAs that return 200 for everything pollute domain quality), and no `noindex` ghosts left in the shell. **Layer 2 — AI-readable content map.** `/llms.txt` is a markdown pointer-index of the site following Jeremy Howard's spec — short, with links into detail at `/install.md` and `/docs.md` for agents arriving with no prior context. `/llms-full.txt` is the entire site flattened to one markdown document. Per-page `.md` mirrors make every route accessible at both `/path` and `/path.md`. Layer 2 also covers the agentic-discovery surfaces — `/.well-known/agent-card.json` (Google A2A), `/.well-known/ai-agent.json` (Aiia, ratified March 2026), `/.well-known/mcp.json` (pointer to your MCP server). Standards aren't converging — ten or more active IETF drafts in this space as of Q1 2026 — so the right play is to support the major ones. **Layer 3 — Structured data.** JSON-LD answers what _kind_ of thing each page is. Pages with FAQPage markup appear in Google AI Overviews at **3.2× the rate** of unstructured equivalents ([Frase.io](http://Frase.io)). Article / BlogPosting is the single most-cited type across ChatGPT, Perplexity, and Gemini — must include `author`, `datePublished`, `dateModified`, `headline`, `articleBody`. Stale `dateModified` is the most common silent failure. Organization with a `sameAs` array (Wikipedia, Crunchbase, LinkedIn, X) is the entity-graph anchor — how AIs resolve "who is this site really." More on what makes individual pages citable in the _Content_ chapter below. **Layer 4 — Content quality.** Layers 1-3 tell the engine _whether it can read you_. Layer 4 is whether each individual page is worth quoting. The 8-dimension AEO score (FAQ schema, answer-first structure, definition density, named statistics, content recency, schema type fit, outbound citation quality, heading structure) all live here. Detail in the _Content_ chapter below. **Layer 5 — External measurement.** Mention rate, citation rate, share-of-voice, third-party surfaces (Wikipedia, Reddit, YouTube). What off-site tools measure — Profound, Scrunch, Otterly, Adobe LLM Optimizer. You don't ship Layer 5; you influence it through Layers 1-4 and through earned media. Detail in _Authority_ and _Measurement_ chapters below. ### Three traps that kill citation silently **Trap 1: Cloudflare default-block.** As of July 1, 2025, Cloudflare blocks AI crawlers by default for every site on its managed protection plan. Retrieval bots — the ones that produce citations — must be explicitly allowlisted in your dashboard. No bot traffic, no warning, no metric. The site simply disappears from citation consideration. **Trap 2: Bot-identity confusion.** Anthropic runs three separate crawlers. `ClaudeBot` trains the model. `Claude-SearchBot` builds the search index. `Claude-User` is invoked when a human pastes a URL. Same vendor, three user-agents, three different jobs. Most sites use a single robots.txt rule and block all three — opting out of training also opts out of citation. The pattern repeats: OpenAI has `GPTBot` and `OAI-SearchBot`. Perplexity has `PerplexityBot` and `Perplexity-User`. Treat training and retrieval as different bots, because they are. **Trap 3: Cloaking.** Google's August 2025 Spam Update penalized UA-based content switching with a **65% average drop in organic visibility within 30 days** (Google, LinkGraph). The right pattern is one DOM for every visitor — same HTML, same meta, same body. No bot detection. **Common questions:** [Do AI engines actually run JavaScript when they crawl?](/faq#do-ai-engines-run-javascript) · [What are the five layers of AEO?](/faq#five-layers) · [Does Cloudflare bot-fight block AI crawlers?](/faq#cloudflare-bot-fight) · [What is the Vercel half-billion stat?](/faq#vercel-half-billion) * * * ## Content: the answer capsule rules Once your pages are reachable and mapped, the question is page-level: does this specific page get quoted? The **Princeton GEO paper** (Aggarwal et al., KDD 2024, arXiv:2311.09735, N=10,000) is the only peer-reviewed controlled experiment of what moves AI citation. The numbers: | Tactic | Measured visibility lift | | --- | --- | | Statistics Addition | **+40%** | | Cite Sources (inline named citation) | **+30–40%** | | Quotation Addition | **+28–40%** | | Keyword Stuffing | **−10%** _(only tactic with measured negative effect)_ | Stack these and you get most of the playbook in two sentences: cite named sources with named statistics. Don't stuff keywords. Then where you put it. Kevin Indig analyzed 18,012 ChatGPT citations across 1.2 million responses. **44.2% originated from the first 30% of the page.** A great answer in paragraph eight isn't a great answer. It's invisible. The unit that works: an **answer capsule**. 40–60 words, directly after the H2, zero links, one named statistic. AI engines extract that block almost verbatim. Indig measured a 38% citation rate for question-as-H2 + answer-capsule structure vs. baseline. **Entity density** matters too. High-performing cited content averages **20.6% proper nouns**. Standard text runs 5–8%. The pages that win are about _named things_ — specific people, specific products, specific years, specific studies. **Freshness compounds.** Content under 30 days old receives **3.2× more citations** than content over 90 days. Pages with visible "last updated" schema receive **1.8× more** (Seer Interactive). AI engines actively prefer recent. ### The eight signals inside a page Layers tell an AI _whether it can read your site_. Signals tell an AI _whether each individual page is worth quoting_. Eight of them carry most of the weight. | Signal | Weight | What it measures | | --- | --- | --- | | **FAQ schema presence** | 20% | Are your `<h2>` questions wrapped in `FAQPage` JSON-LD? | | **Answer-first content structure** | 18% | Do the first one or two sentences answer the page's implied question? | | **Authoritative definition density** | 15% | "X is …" or "X refers to …" sentences near the top? | | **Statistics + named-source citations** | 12% | Quotable numbers with attribution? | | **Content recency signals** | 10% | Accurate `dateModified`; freshness in body? | | **[Schema.org](http://Schema.org) Article / NewsArticle / HowTo** | 10% | Is the right type applied to the right content? | | **Outbound citation quality** | 8% | Links to Wikipedia, .gov, .edu, major publications? | | **Heading structure for answer extraction** | 7% | `<h2>`/`<h3>` phrased as questions or clear topics? | Total: 100 points. A site scoring 78 has done well on most and has gaps on a few. A site scoring 32 has either skipped structured data entirely or is structured for human readers in ways AI engines can't extract. These signals aren't ranking factors. They're **citation factors**. They predict whether an AI engine, given the choice between your page and another page on the same topic, will pull a quote from yours. The anti-pattern most copy fails on: vague attribution. _"Studies show"_ or _"experts say"_ doesn't earn the +40%. The GEO paper's lift comes from the pattern _"According to \[Named Source, Year\], X%."_ Name your sources by name. Name your numbers as numbers. **Common questions:** [What is the Princeton GEO research?](/faq#princeton-geo) · [What counts as "answer-first content"?](/faq#answer-first-content) · [What is the 8-dimension AEO score?](/faq#eight-dimension-score) · [Does keyword density matter?](/faq#keyword-density) * * * ## Authority: 94% of citations come from someone else's page Muck Rack analyzed 1 million+ AI citations in December 2025: **94% came from non-paid, non-brand-owned sources.** Your blog doesn't count. Third-party coverage does. And the third-party surface is engine-specific. Profound's 230K-prompt study (Aug 2024 – June 2025): - **Wikipedia: 47.9% of ChatGPT's top-10 citation share** - **Reddit: 46.7% of Perplexity's top-10 citation share** - **YouTube: 11.3% on ChatGPT, 11.1% on Perplexity** — top-2 cited domain on both These are three different games. The Wikipedia strategy that wins ChatGPT does nothing for Perplexity. The Reddit strategy that wins Perplexity gets you nowhere on Google AI Overviews. You don't pick one strategy. You pick one _per engine_. YouTube has a hidden mechanic: videos with published transcripts earn **4–7× more AI citations** than those without (Am I Cited). The transcript is what gets indexed. The video is just the host. The brand-mention insight from Foundations is the off-page anchor: 0.664 vs. 0.218. Earned media — third-party news distribution — produces a **239–325% median lift** in AI search visibility (Stacker/Scrunch controlled study). Press releases that get republished are now infrastructure work, not vanity. **Anti-pattern: astroturfing on Reddit.** Reddit accounts for 40%+ of LLM citations on some engines, and the moderator community is aggressive about exposing fake-comment campaigns. The "Trap Plan" incident (2025): a company's AstroTurf operation got leaked, spread widely, and the exposure threads became the cited surface. Negative Reddit context is a citation liability, not a recoverable PR problem. **Anti-pattern: self-editing Wikipedia.** Conflict-of-interest detection is thorough. Discovered violations result in article deletion plus page protection that blocks future legitimate updates. **Anti-pattern: prompt injection.** OWASP LLM01:2025 — embedded instructions to manipulate AI responses get content blacklisted by major LLM providers. The opposite of citation. * * * ## Measurement: Inclusion Rate or nothing Almost everything the market calls "AI visibility" is statistically invalid. LLM responses are non-deterministic. Research cited by PlatelunchCollective found **fewer than a 1-in-100 chance** that ChatGPT produces the same brand recommendation list twice across 100 identical runs of the same prompt. A single run is a coin flip, not a metric. The serious measurement tools know this. **Evertune runs each prompt 100 times.** Budget tools run each prompt 1 time. They are not comparable products. The north-star KPI is **Inclusion Rate** — the percentage of prompts (out of a meaningful set, 30+ minimum) on which your brand is cited at least once on a given engine. Measured _per engine._ The Consensus Gap is why per-engine matters: any blended score hides more than it reveals. A tool that doesn't break out per engine is selling theater. The attribution layer is even worse. An estimated **70.6% of ChatGPT-sourced visits arrive without referrer headers** in GA4 — mobile apps strip them. Your AI referral traffic in analytics is the floor. The ceiling is unknown. And citation position #1 carries a **33% click-through rate** vs. 13% at position #10 (Discovered Labs) — so position-within-citation matters, not just inclusion. Three anti-patterns to refuse: - Accepting **single-run snapshots** as metrics. If a vendor runs each prompt once weekly, they're reporting coin-flip data. Require 10+ runs per prompt per period. - Blending per-engine scores into **one number**. The Consensus Gap (2.37% URL overlap) makes any blended score structurally misleading. - Crediting AEO for **branded-search lift** that ran simultaneously with PR and paid campaigns. If a vendor can't show you per-engine breakdowns with multi-run sampling, ask them how their score is constructed. Then ask why position 3 doesn't exist. * * * ## Visibility plus citability — two halves of one job Citability — Will the AI quote you? FAQ schema Answer-first structure Definitions Statistics + sources Recency Article / HowTo schema Visibility — Can the AI read you? Pre-rendered HTML robots.txt sitemap.xml Real 404s Cited in AI answers A site can be perfectly visible and entirely unquotable. The crawler reads every word, and the AI engine still chooses to quote a competitor because the competitor's page is structured for citation and yours is structured for marketing. A site can also be perfectly citable in theory and entirely invisible in practice. The content is gold. The crawler never reads it because the React shell renders empty. Either failure is fatal. You need both halves. **Common questions:** [What's the difference between visibility and citability?](/faq#visibility-vs-citability) * * * ## Landscape: the gap nobody fills The market splits in two. **The rendering services** output crawler-friendly HTML and stop there. **[Prerender.io](http://Prerender.io)** ($49/mo Starter, 25K renders). **LovableHTML** ($9–$83/mo, claims 1,000+ agencies, 20M pages/month). **Rendertron** (Google OSS, unmaintained). They solved the JS-execution problem in 2018 for SEO. They don't generate llms.txt. They don't emit [schema.org](http://schema.org). They don't extract answer capsules. They don't measure citation. Step one of a five-step stack. **The AEO monitoring platforms** track citations. They assume your content is already crawlable. **Profound** ($499/mo Lite; enterprise $30K–$100K+/year; $35M Series B from Sequoia). **AthenaHQ** ($295/mo, built by ex-Google Search and DeepMind engineers). **Scrunch, Otterly, Goodie.** If your SPA is invisible — and 569 million Vercel data points say it probably is — their monitoring score stays at zero regardless of your subscription tier. They don't render. And the agencies: **iPullRank, Animalz, daydream** (Series A $15M, April 2026). AEO retainers at **$8,000–$30,000+/month.** None of them say _"we'll render your SPA so it can be cited."_ The third option is _"just migrate to Next.js."_ Real cost: 3–4 months of rewrite work and a frozen feature roadmap during the migration. Lovable / v0 / Bolt users can't self-migrate at all. So the hole: **nobody at indie pricing connects render → AI-native output → citation verification in one product.** The half-products are everywhere. The complete loop isn't anywhere. That's the gap. That's what we built. **Common questions:** [How is AgentSite different from monitoring tools?](/faq#how-is-this-different-from-profound) · [Why can't I just do this myself?](/faq#why-cant-i-just-do-this-myself) * * * ## AgentSite: render to citation, in one install One install — Express middleware, Express-Sidecar for non-streaming SSR, or pure nginx config shipping today; an Edge port and a streaming-SSR SDK in pre-release — and the same bytes serve every visitor: human browser, Googlebot, GPTBot, ClaudeBot. No UA switching. No cloaking risk. AgentSite CloudYour SPAAgentSite MiddlewareVisitor / AgentAgentSite CloudYour SPAAgentSite MiddlewareVisitor / Agentalt\[Cache miss\]\[Cache hit\]GET /pricing (humans + agents alike)Render request (browser-as-a-service)Headless render of /pricingFully-rendered HTMLExtract contentGenerate llms.txt entryInject validated JSON-LDGenerate /pricing.mdBundle (HTML + schema + markdown + tldr)Read cached bundle from edgeSplice bundle into index.html(preboot div, CSS-hidden the moment JS boots)Same response for everyoneTelemetry: who fetched what (async, batched) **Same bytes for everyone.** Humans and agents receive byte-identical HTML — the difference is what they look at. A browser hydrates the React shell and renders the page normally; the spliced preboot `<div>` carrying the markdown body is hidden by an inline `<style>` rule the moment JavaScript boots, so the human never sees it. An agent reads the head (refreshed meta + injected schema) and the body (the preboot markdown) without executing any JS. No bot detection, no per-bot artifact tailoring, no two-tier serving — every fetch sees the same truthful version of your site. Per-site dashboard override switches to legacy `<noscript>` mode for tools that honor it (Claude Code WebFetch). ### What the middleware actually does 1. **Renders JavaScript pages in a real browser, on our infrastructure, on demand.** You don't run a headless Chrome in your stack. We do. 2. **Generates a markdown version of every page.** Markdown reads better to language models than HTML reads. `/pricing` becomes available at `/pricing.md`, automatically — noise stripped, structure preserved. 3. **Generates `llms.txt` and `llms-full.txt`.** A coherent, editorial map of your site — not a link dump, a curated index with descriptions of each section's purpose, generated from real page content. 4. **Injects the right JSON-LD on every page.** Article on blog posts. FAQPage on Q&A pages. HowTo on procedures. Organization on the homepage with `sameAs` anchored to Wikipedia, Crunchbase, LinkedIn. Validated, anti-pattern-checked before publish — schemas that read as keyword-stuffing or stale-date-modified or HowTo-on-listicle are rejected at the gate. 5. **Generates intelligent per-page summaries.** Each page gets a 40–60 word answer capsule — the Princeton GEO pattern, automated. The AI doesn't have to read the whole page to know whether it's relevant. 6. **Publishes the agentic-frontier files.** `agent-card.json`, `mcp.json`, `ai-agent.json` (Aiia), `agents.json` (Wildcard), `ai.txt`. As the standards stabilize, your site stays current — without a deploy. 7. **Splices all of it into your existing HTML.** No bot detection. No two-tier serving. Same response for humans and agents. 8. **Serves all of it from the edge** with first-party telemetry on every fetch. The customer sees exactly what crawled them, when, which pages — including cache-hit fetches that never reached AgentSite Cloud. ### Why this is different **One render pass, every artifact.** Most teams that try to do AEO themselves end up maintaining six files in parallel: HTML, markdown, `llms.txt`, JSON-LD, OG tags, agent card. Six sources of truth. They drift constantly. AgentSite generates all of them from one render of your page. There's one source of truth — your live site — and every artifact derives from it. **Markdown-first, not HTML-first.** Most "AI-friendly" outputs are stripped HTML. AgentSite emits markdown. Language models prefer it — it's the format their training corpus was densest in, the format their tool calls return, the format their context windows consume most efficiently. **First-party bot telemetry.** Every monitoring tool tries to _infer_ what AI engines cite by sampling answers from the outside. AgentSite knows what they fetched. Directly. From the request logs. A live dashboard shows: _Perplexity fetched 12 of your pages this week. ChatGPT fetched 3. AI Overviews has not visited in 30 days. ClaudeBot is hitting `/pricing` four times a day._ No monitoring vendor can produce that data. We're the only party in the request path. **Zero data exposure.** AgentSite's renderer fetches your site through the public, unauthenticated front door — the same surface a human browser or any AI crawler reaches. We do not log into your application. We do not hold credentials, API keys, or session cookies. We can see what GPTBot can see, and nothing more. **Reversible.** Remove the snippet. Everything we generated stops being served. Your site goes back to vanilla. There's no proprietary data structure to migrate off, no DNS change to undo, no contract holding your CDN hostage. The decision is reversible in five minutes, which makes it a low-stakes purchase. We earn the renewal monthly, by being good. ### Build it yourself — and the cost of doing it wrong This is the question every technical buyer asks. _Why can't I just do this myself?_ You can. The cost is the question. A founder we met at the SF Founders Gathering in May 2026 had spent several days hand-rolling AEO for his own site. He'd added FAQ schema, written an `llms.txt`, generated per-page markdown, configured `robots.txt` for the AI crawlers. He was the most enthusiastic prospect at the event. He'd built half the stack and didn't know it. The math, plainly: - **Doing it yourself, well.** Three to five days the first time. One day per quarter to keep current with the standards. Custom code in your repo for every standard you support. The team owns it forever. - **Doing it yourself, badly.** The same three to five days, except the result is silently broken and your AEO score is zero on the dimensions that matter. Hard to detect; harder to fix. A bad `llms.txt` — the most common failure mode — actively confuses AI engines and _reduces_ citation likelihood. The site is _worse_ for having it. - **AgentSite.** Pick the install pattern that matches your deploy — Express, Express-Sidecar, or Nginx shipping today; Edge and a streaming-SSR SDK in pre-release. Updates as the platform updates. We track the standards so your team doesn't. The question isn't whether you _can_ do this yourself. You can. The question is whether the time is better spent on the product you're actually building. ### What we won't pretend to do Earn your Reddit mentions. Write your Wikipedia page. Run your YouTube channel. Off-page authority is human work. We render and we emit. We're honest about the line — and we'll point you at the Princeton paper so you know what works on the other side. **Common questions:** [What is AgentSite, exactly?](/faq#what-is-agentsite) · [How does the snippet work in 4 lines?](/faq#how-does-the-snippet-work) · [What happens if I cancel?](/faq#cancel-reversibility) · [What about my private data?](/faq#private-data) * * * ## Multi-engine measurement: parametric or retrieval, labeled honestly There are two different things you might be measuring, and almost no commercial tool tells you which. **Parametric recall** — what the model already memorized from training. The answer comes from the weights. Same prompt, same model — different answer 99% of the time, but no fresh data, no citations to current URLs. **Retrieval-grounded inclusion** — the model runs a live search and writes the answer from documents it just retrieved. _This_ is what produces citations. This is what users actually experience in ChatGPT Search, Perplexity, and Google AI Overviews. These are not the same measurement. A `gpt-4o-mini` probe with no search tool tells you what the base model memorized. It does not predict what ChatGPT Search retrieves when a real user asks the same question. The honest answer is to run two probes and label which is which. AgentSite's multi-engine probe ships in two phases: **Phase 1** (5 engines, ~$0.30/day for 100 reports): - OpenAI `gpt-4o-mini` (parametric) - Groq Llama 3.3 70B (parametric, free tier) - Gemini 2.5 Flash with `google_search` grounding (retrieval — **free** up to ~1,500 grounded queries/day via AI Studio) - Gemini 2.5 Flash parametric (fallback) - Per-engine cards, labeled `RETRIEVAL` or `PARAMETRIC` **Phase 2** (adds 2 retrieval-grounded engines): - Perplexity Sonar (retrieval, citation array, ~$0.04/report) - Anthropic Haiku 4.5 + `web_search` tool (retrieval, ~$0.08/report) Total ongoing cost at launch: pennies per day. Total cost at 5,000 reports/day full retrieval: ~$500–$800/month. Self-fundable at that volume. Retrieval or parametric. We label it. Every other tool hides it. That's the bet. * * * ## What this adds up to AEO is the discoverability layer for the AI era. Most sites are failing on at least three of the four on-site layers, and almost all of them have at least one Layer 1 problem they don't know about. The fix is not a checklist. It's the readability layer that handles Layers 1-3 automatically, helps you ship Layer 4 content quality, validates everything before publish, and stays current with the standards as they evolve. Layer 5 — external reputation — is yours to earn, but you can't earn it until 1-4 are right. That's what AgentSite does. Visibility — pre-rendered HTML for every crawler. Citability — JSON-LD, llms.txt, per-page markdown, intelligent summaries. Frontier — agent cards for the agent web. Trust — every artifact carries our attestation, validated before it ships. Privacy — public surface only, no credentials, ever. You pick the install pattern that matches your stack. We do the rest. > _Web Claude can't read your website. AEO is the new SEO. We'd sure like it just taken care of for us._ That's the whole thesis. The five layers explain why. The install is how. * * * ## What to do next Three actions, in order: 1. **Get your AEO Report.** 90 seconds. Eight-dimension scorecard, multi-engine citation probe, top three fixes ranked by leverage. Free. [Run the report →](/aeo-report) 2. **See it as diagrams.** Same story told as code, sequence diagrams, and before/after comparisons. [AEO in Pictures →](/aeo-in-pictures) 3. **Drop in the snippet.** Four lines of Express middleware. Same DOM for humans and bots. [Pricing →](/pricing) You can also see [what runs under the hood →](/platform). * * * ## Further reading **Internal companions:** - **[AEO in Pictures](/aeo-in-pictures)** — the same story told in diagrams and before/after code samples - **[AEO FAQ](/faq)** — 44 questions across product, theory, and the canonical pitfall catalog - **[Platform Deep Dive](/platform)** — what runs under the hood - **[/demos](/demos)** — the same site rendered six ways (Vue, React, Next × before/after), curl any of them side by side **Canonical references** (the structured treatment of the concepts above): - **[Agent readability](/agent-readability)** — the thesis page: what the property is, why it's the foundation, citation-is-binary, the two-bot framing - **[The Five Layers of AEO](/five-layer-aeo)** — render → navigate → structure → content → measure, with the chain-dependency explained - **[SSR-junk and bot walls](/ssr-junk-bot-wall)** — the two most common Layer-1 failure modes plus the curl recipe that finds either one in under a minute - **[Direct answer](/direct-answer)** — the 40-60 word paragraph the agent quotes, with the schema pairing **External sources cited above:** - Vercel, _"The Rise of the AI Crawler"_ (2025) — the 569M / 0% JS measurement - Aggarwal et al., _"GEO: Generative Engine Optimization"_ (KDD 2024, arXiv:2311.09735) — the +30–40% controlled experiment, N=10,000 - Kevin Indig, _Growth Memo_ — the 18,012-citation study, the Consensus Gap, the first-30% finding - Profound, _"AI Engine Citation Landscape"_ (2025) — Wikipedia 47.9% / Reddit 46.7% / YouTube ~11% - Ahrefs Brand Mentions Study — 0.664 vs 0.218 correlation across 75K brands - Muck Rack, _AI Citations Analysis_ (Dec 2025) — the 94% off-page finding - Stacker/Scrunch — earned-media 239–325% AI visibility lift - Seer Interactive — freshness 3.2× and dateModified 1.8× citation lifts - [Frase.io](http://Frase.io) / SE Ranking — FAQPage 3.2× AI Overviews appearance rate - OWASP LLM01:2025 — prompt injection watchlist - Discovered Labs — position #1 33% CTR vs. position #10 13% CTR