46 questions · 3 categories

# Frequently askedquestions.

This is the AgentSite FAQ: every AEO and AgentSite question we've heard, in one place. Three categories — **product** questions about how AgentSite works, **AEO theory** covering the five-layer model and `/llms.txt` and schema, and the **pitfall catalog** of silent failure modes most sites ship today. Search, jump to a category, or read end-to-end; each question is anchor-linkable for sharing.

46 matching

AllProductAEOCommon mistakes

## Product

About AgentSite — what it does, how it installs, what it costs, what it sees.

**What is AgentSite, exactly?#**

AgentSite is the readability layer that sits between your site and AI crawlers. You install it via one of five patterns shipping today — \*\*Nginx\*\* (pure config, no code in your deploy — our favorite), \*\*Express\*\* middleware, \*\*Express-Sidecar\*\* (for non-streaming SSR), \*\*Edge\*\* (Cloudflare Pages / Vercel / Netlify via Fetch-API edge functions), or the \*\*streaming-SSR SDK\*\* (Next App Router, Nuxt 3 streaming, RSC, SvelteKit streaming) — and it renders your JavaScript pages in our cloud, generates llms.txt + per-page markdown + JSON-LD schema + agent-card files, and embeds them in every response. Same bytes to everyone — humans and AI agents alike. The agent-readable markdown body lives inside a preboot hidden \`<div>\` that JS-on browsers never render (a \`<style>\` rule hides it once the page boots) and that extractors which strip \`<noscript>\` (Claude.ai's web tool, most readability libs) can still read. Per-site override switches to legacy \`<noscript>\` mode for tools that honor it (Claude Code WebFetch). No content switching, no cloaking risk.

[Read the chapter →](/aeo#agentsite-render-to-citation-in-one-snippet)

Related:[How does the snippet work?](#how-does-the-snippet-work)[Do you support \[Vue / React / Svelte / Astro / Lovable / v0 / Bolt / plain HTML\]?](#what-frameworks-supported)

**How is AgentSite different from AEO grader and visibility-monitoring tools?#**

Those tools diagnose. AgentSite diagnoses AND fixes. Graders give you a score; visibility monitors tell you when you do or don't get cited. None of them put structured content in front of crawlers for you. AgentSite measures the same dimensions they do AND installs the middleware that makes your content citable. The whole loop in one product.

[Read the chapter →](/aeo#landscape-the-gap-nobody-fills)

Related:[Why can't I just do this myself?](#why-cant-i-just-do-this-myself)[Will this break my site?](#will-this-break-my-site)

**Why can't I just do this myself?#**

You can. v1 is a weekend of engineering — JSON-LD shapes, a basic llms.txt, a render trigger. The ongoing tail is what burns: bot UAs shift quarterly (Google AI Overviews changed citation behavior three times in 2024), the llms.txt schema is still evolving with the working group, attack-probe patterns mutate, freshness scans need cluster-aware scheduling, render costs need pooling across pods. We treat that tail as the product — Markdown that LLMs read directly (not pre-rendered HTML cloaking), agents watched quarterly, improvements rolled across the customer fleet. The cost of doing it wrong is invisible: bad llms.txt or stale schema actively reduces citation rates and you can't tell without scanning from outside. If your team wants to own this layer, build it; if you'd rather treat it as infrastructure, that's what we sell.

[Read the chapter →](/aeo#build-it-yourself-and-the-cost-of-doing-it-wrong)

Related:[How is AgentSite different from AEO grader and visibility-monitoring tools?](#how-is-this-different-from-profound)

**Will this break my site?#**

No. The snippet is failsafe by design — if our cloud is unreachable, it falls back to your unmodified index.html. There's no exception that bubbles up to your visitors. Watch for "\[agentsite\]" lines in stdout to confirm it's firing; if you see "fallback" mode, your humans still get the SPA they came for. Removing the snippet is one line of code; everything goes back to before.

Related:[What happens if I cancel?](#cancel-reversibility)[How does the snippet work?](#how-does-the-snippet-work)

**What about my private data?#**

AgentSite renders your site through the public, unauthenticated front door — the same surface GPTBot or a fresh incognito browser tab can reach. Our renderer holds no credentials, no API keys, no session cookies. It can't log in, can't impersonate a user, can't reach /admin or authenticated routes. The boundary is your auth check, not our promise. We see what GPTBot sees and nothing more.

Related:[Where does my data live?](#data-residency)[What telemetry do you collect?](#telemetry)

**Do you support [Vue / React / Svelte / Astro / Lovable / v0 / Bolt / plain HTML]?#**

Yes — the snippet is framework-agnostic. Five install patterns ship today. \*\*Nginx\*\* for thin-nginx-in-front-of-a-static-SPA when "no code in our deploy" is the hard constraint — our favorite. \*\*Express\*\* for a static SPA in a Node container (Vue, React, Svelte, Solid, plain HTML, Lovable / v0 / Bolt output — anything that ships a \`dist/index.html\`). \*\*Express-Sidecar\*\* for non-streaming SSR running its own server (Next.js pages router, Nuxt 2, classic Express/Koa, Rails, Django, WordPress, PHP-FPM). \*\*Edge\*\* for Cloudflare Pages / Vercel / Netlify via Fetch-API edge functions. The \*\*streaming-SSR SDK\*\* for React Server Components, Next App Router, Nuxt 3 streaming, SvelteKit streaming. Further out: native njs port, CloudFront Lambda@Edge, Python (Django/Flask/FastAPI), PHP (PHP-FPM / WordPress).

Related:[How does the snippet work?](#how-does-the-snippet-work)

**How does the snippet work?#**

On every incoming request the snippet fetches a JSON RenderBundle (title, meta, schema graph, markdown body) from our cloud renderer and splices it into the page — meta and JSON-LD in the \`<head>\`, agent-readable markdown body inside a preboot hidden \`<div>\` just before \`</body>\`. The same enriched bytes go to every requester, browser or bot. Humans never render the preboot block because an inline \`<style>\` rule hides it the moment their JavaScript boots; extractors that strip \`<noscript>\` still read it cleanly. Cached at the edge per route. The Express install is \`require('./agentsite')\`, instantiate with your dist directory + site URL + token, mount it as your Express SPA fallback — four lines. The other patterns (Nginx, Sidecar, Edge, streaming-SSR SDK) follow the same shape: render bundle in, enriched HTML out — same bundle, different injection point.

[Read the chapter →](/aeo#what-the-middleware-actually-does)

Related:[Do you support \[Vue / React / Svelte / Astro / Lovable / v0 / Bolt / plain HTML\]?](#what-frameworks-supported)

**What are the pricing tiers?#**

Five tiers — Free, Solo, Pro, Studio, Enterprise. Plans are sized in units. A cache scan draws half a unit and a fresh page generation draws another half, so an unchanged page costs 0.5u and a changed page costs 1u. Plan units top up every week on the paid tiers; when they run out, your credit pool (kickoff plus any top-ups) covers the rest. Pay-as-you-go top-up is available on every plan, including Free — you can stay on Free indefinitely by topping up. Site and page limits are operational and live in your account settings; see \[/pricing\](/pricing) for the current tier shape and rates.

Related:[What happens if I cancel?](#cancel-reversibility)

**What happens if I cancel?#**

You remove four lines of code from your server. Your site goes back to vanilla. There's no proprietary data structure to migrate off, no DNS change to undo, no contract holding your CDN hostage. The render bundles in our cache age out; nothing in your repo changes. The decision to install is reversible in five minutes — that's deliberate.

[Read the chapter →](/aeo#why-this-is-different)

Related:[Will this break my site?](#will-this-break-my-site)

**How does authentication work for the snippet?#**

The snippet authenticates to our render API via a Bearer token (\`AGENTSITE\_TOKEN\` env var). Each token belongs to one customer and is scoped to the registered site domains they own; renders for unregistered hosts return 401 unless the host is on a public-domain allow-list (used for community demos). Your team gets a token at sign-up; rotate from the dashboard if needed.

Related:[What about my private data?](#private-data)

**Where does my data live?#**

Render bundles (the structured output we extract from your public pages) live in our PostgreSQL cluster on Digital Ocean US infrastructure. We do not store request bodies, query strings beyond pathnames, cookies, or headers beyond the User-Agent class. Aggregated daily rollups for the dashboard live alongside. For EU residency or specific compliance requirements, contact us — Enterprise can be deployed on customer infrastructure.

Related:[What about my private data?](#private-data)[What telemetry do you collect?](#telemetry)

**What telemetry do you collect?#**

Per bot request, we log: the URL path, a normalized crawler class (gptbot, claudebot, perplexitybot, etc.), a coarse IP class (verified-bot range or "other"), and timing/size metadata. No request bodies, no full User-Agent strings, no cookies. Telemetry is observation only — bot fetches get the same canonical artifact regardless of who they are. The dashboard surfaces aggregates ("Perplexity fetched 12 pages this week"), never individual traces.

Related:[What about my private data?](#private-data)[Where does my data live?](#data-residency)

## AEO

The field. What answer engine optimization is — also called GEO or LLMO — and how it differs from SEO.

**What is AEO and how is it different from SEO?#**

AEO is Answer Engine Optimization — making your content citable by AI engines like ChatGPT, Claude, Perplexity, and Google AI Overviews. (You'll also see this field called \*\*GEO\*\* — Generative Engine Optimization — or \*\*LLMO\*\* depending on who you read; same idea, different acronym. See the next question.) SEO was about ranking in a list of ten blue links so a human would click. AEO is about being quoted inside an answer paragraph the AI generates. Ranking is competitive (rank-1 wins clicks). Citation is binary (you're in the answer, or you're not). The optimization target is fundamentally different.

[Read the chapter →](/aeo#foundations-there-is-no-position-3)

Related:[Is this AEO, GEO, or LLMO?](#aeo-vs-geo-vs-llmo)[What are the five layers of AEO?](#five-layers)[What's the difference between visibility and citability?](#visibility-vs-citability)

**Is this AEO, GEO, or LLMO?#**

Same field, different acronyms. \*\*AEO\*\* — Answer Engine Optimization — is the practitioner-speak term, used most by agencies, tooling vendors, and the SEO crowd repurposing their muscle for AI. \*\*GEO\*\* — Generative Engine Optimization — comes from the Princeton GEO paper (Aggarwal et al., KDD 2024) and is the term you'll see in academic work and content-marketing thought-leadership. \*\*LLMO\*\* — Large Language Model Optimization — is the newest, narrowest variant, treating the LLM itself as the optimization target rather than the answer engine wrapped around it. The vocabulary hasn't settled and probably won't for another year or two. AgentSite is the technical foundation under all three names: render-time HTML, per-page markdown, JSON-LD schema, llms.txt, agent-card files, and external measurement so any AI engine — under any acronym — can read you and cite you.

[Read the chapter →](/aeo#foundations-there-is-no-position-3)

Related:[What is AEO and how is it different from SEO?](#what-is-aeo)[What is the Princeton GEO research?](#princeton-geo)[What are the five layers of AEO?](#five-layers)

**What are the five layers of AEO?#**

AI engines operate across five layers — four on your site (you control them) and one off your site (you influence it). \*\*Layer 1: crawler access\*\* — can a non-JS HTTP client read your page? (pre-rendered HTML, robots.txt, sitemap.xml, real 404s). \*\*Layer 2: AI-readable content map\*\* — \`/llms.txt\`, per-page \`.md\` mirrors, \`/.well-known/agent-card.json\` and friends. \*\*Layer 3: structured data\*\* — JSON-LD (Article, FAQPage, Organization). \*\*Layer 4: content quality\*\* — does each page lead with a direct answer, define its terms, cite named statistics? (the 8-dimension AEO score lives here). \*\*Layer 5: external measurement\*\* — mention rate, citation rate, share-of-voice across ChatGPT / Claude / Perplexity / Gemini, and the third-party surfaces (Wikipedia, Reddit, YouTube) those models cite from. Layers 1-3 are pure technical lift; Layer 4 is content work; Layer 5 is reputation downstream of everything below. Layers stack: get Layer 1 wrong and none of the others matter.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What is AEO and how is it different from SEO?](#what-is-aeo)[Do AI engines actually run JavaScript when they crawl?](#do-ai-engines-run-javascript)[What is llms.txt?](#what-is-llms-txt)[What JSON-LD schema types actually drive citations?](#schema-types)[What is agent-card.json and the .well-known set?](#agent-card-json)

**Do AI engines actually run JavaScript when they crawl?#**

Overwhelmingly no. Vercel's analysis of half a billion GPTBot fetches across their network found zero JavaScript execution. ChatGPT, Claude, Perplexity, Bing Chat, and Google AI Overviews fetch the raw HTML response and read what's in it; they do not boot a headless Chromium and wait for hydration. If your site is a single-page app, all of those crawlers see \`<div id="app"></div>\` where your content was supposed to be.

[Read the chapter →](/aeo#technical-if-the-bot-cant-reach-you-nothing-else-matters)

Related:[What is AEO and how is it different from SEO?](#what-is-aeo)[Mistake: Shipping a JavaScript-only SPA with no SSR](#mistake-js-only-spa)

**What is llms.txt?#**

llms.txt is a markdown index of your site, following Jeremy Howard's spec. It has an H1 (project name), a blockquote summary, and H2 sections of links grouped by user intent (Get Started, Reference, Examples). The good ones disambiguate explicitly — FastHTML's llms.txt says outright that it is \*not\* FastAPI-compatible — which is exactly what makes citations accurate. The companion \`/llms-full.txt\` is your entire site flattened to one markdown document for paste-into-context use.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[Why do per-page .md mirrors matter?](#per-page-md)[What are the five layers of AEO?](#five-layers)[Mistake: Auto-generated llms.txt that's a link dump](#mistake-autogen-llms-txt)

**Why do per-page .md mirrors matter?#**

Every route accessible at both \`/path\` and \`/path.md\`. The nbdev / Answer.AI pattern. Markdown is what AI engines prefer to read — it's the format their training corpus was densest in, the format their tool calls return, the format their context windows consume most efficiently. Same content as your HTML page, denser format, lower token cost on the AI side, higher comprehension.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What is llms.txt?](#what-is-llms-txt)[What are the five layers of AEO?](#five-layers)

**What JSON-LD schema types actually drive citations?#**

Per Princeton GEO research and current platform behavior: Article/BlogPosting (most-cited type across ChatGPT, Perplexity, Gemini), Organization (with sameAs anchoring to Wikipedia/Crunchbase/LinkedIn), Person (author entities with credentials), Product/Service (commercial pages), FAQPage (still extracted heavily even though Google killed the rich result in 2023), HowTo (drives the largest measurable gains for procedural content), and BreadcrumbList (site hierarchy).

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[When should I use FAQPage vs Article vs HowTo?](#faqpage-vs-article-vs-howto)[Why does Organization sameAs matter?](#organization-sameas)[What are the five layers of AEO?](#five-layers)

**When should I use FAQPage vs Article vs HowTo?#**

Article (or BlogPosting) for editorial content with an author and a publication date — the default. FAQPage for genuine question-and-answer pages where each Q is a real question users ask. HowTo for actual procedures with steps and a defined outcome — never for listicles or "best of" pages, which is one of the most-penalized misuses. When in doubt: Article.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What JSON-LD schema types actually drive citations?](#schema-types)[Mistake: FAQPage schema as a keyword dumping ground](#mistake-faq-keyword-dump)[Mistake: HowTo schema on a listicle](#mistake-howto-on-listicle)

**Why does Organization sameAs matter?#**

sameAs is the entity-graph anchor. Your Organization JSON-LD's \`sameAs\` array should link to your Wikipedia entry, Crunchbase profile, LinkedIn page, X/Twitter account — the canonical public-web endpoints for your entity. AI engines use this to resolve "who is this site really" against the rest of the web. Without sameAs, you're a string; with it, you're an entity in their knowledge graph.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What JSON-LD schema types actually drive citations?](#schema-types)[Mistake: Inconsistent entity facts across the web](#mistake-inconsistent-entity-facts)

**What is agent-card.json and the .well-known set?#**

The agentic frontier files at /.well-known/ — agent-card.json (Google A2A standard), ai-agent.json (Aiia spec, March 2026), agents.json (Wildcard OpenAPI extension), mcp.json (your MCP server pointer), and ai.txt (emerging interaction-rules DSL). These are how an autonomous agent — not a chatbot answering a question, but an actual agent — discovers what your site can do and how to interact with it.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[When will autonomous agents actually use this?](#agents-arrive)[What are the five layers of AEO?](#five-layers)

**When will autonomous agents actually use this?#**

The standards aren't converging — there are 10+ active IETF drafts in this space as of Q1 2026. Gartner forecasts 25% of search shifts to AI-mediated answers by end-2026. The agent web is small today. By end-2027, the sites that have shipped Layer 4 will be the sites agents have been talking to for two years. Whoever ships first gets cited.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What is agent-card.json and the .well-known set?](#agent-card-json)[What are the five layers of AEO?](#five-layers)

**How do ChatGPT, Claude, Perplexity, and AI Overviews actually crawl?#**

Each operates a named bot user-agent: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended (gates AI Overviews training), Applebot-Extended, Bingbot, CCBot. They issue HTTP GETs as you'd expect, read the response body, and (overwhelmingly) do not run JavaScript. Some respect robots.txt strictly; some respect it loosely; some user-initiated bots (when a user pastes your URL) ignore robots.txt entirely under the "user agent" doctrine.

[Read the chapter →](/aeo#technical-if-the-bot-cant-reach-you-nothing-else-matters)

Related:[Do AI engines actually run JavaScript when they crawl?](#do-ai-engines-run-javascript)[What does GPTBot actually fetch on a typical visit?](#what-gptbot-fetches)

**What does GPTBot actually fetch on a typical visit?#**

A handful of pages per visit: usually your homepage, llms.txt if present, the most-internally-linked routes from your sitemap, plus any URL the user prompted into ChatGPT directly. It does not deep-crawl. It does not run JavaScript. It does respect robots.txt with the "GPTBot" user-agent string. The fetch is short and uncached on its side; if your origin returns slowly, GPTBot moves on.

[Read the chapter →](/aeo#technical-if-the-bot-cant-reach-you-nothing-else-matters)

Related:[How do ChatGPT, Claude, Perplexity, and AI Overviews actually crawl?](#how-ai-engines-crawl)[Does Cloudflare bot-fight block AI crawlers?](#cloudflare-bot-fight)

**Does Cloudflare bot-fight block AI crawlers?#**

By default, yes. Cloudflare's "Bot Fight Mode" challenges or blocks GPTBot, ClaudeBot, and most named AI crawlers as "bots." If you're behind Cloudflare and you want AI engines to read your site, you need to explicitly allow-list the verified-bot user agents. This is one of the most common silent AEO failures — site owners discover it months after deploy when nothing they ship gets cited.

[Read the chapter →](/aeo#three-traps-that-kill-citation-silently)

Related:[How do ChatGPT, Claude, Perplexity, and AI Overviews actually crawl?](#how-ai-engines-crawl)[Mistake: Cloudflare Bot Fight Mode blocking AI crawlers](#mistake-bot-fight-blocking)

**What's the difference between visibility and citability?#**

Two halves of one job. Visibility: can the AI read your page at all? (Pre-rendered HTML, robots.txt, sitemap, no soft 404s.) Citability: will the AI choose to quote you? (FAQ schema, answer-first content, definitions, statistics with sources.) A site can be perfectly visible and entirely unquotable. A site can also be perfectly citable in theory and entirely invisible because the React shell renders empty. Either failure is fatal.

[Read the chapter →](/aeo#visibility-plus-citability-two-halves-of-one-job)

Related:[What are the five layers of AEO?](#five-layers)[What is the 8-dimension AEO score?](#eight-dimension-score)

**What is the 8-dimension AEO score?#**

Eight signals carry most of the weight in whether AI engines cite a page. FAQ schema presence (20%), answer-first content structure (18%), authoritative definition density (15%), statistics + named-source citations (12%), content recency (10%), Article/HowTo schema (10%), outbound citation quality (8%), heading structure for answer extraction (7%). Total 100 points. The score predicts citation likelihood, not search ranking.

[Read the chapter →](/aeo#the-eight-signals-inside-a-page)

Related:[What counts as "answer-first content"?](#answer-first-content)[What's the difference between visibility and citability?](#visibility-vs-citability)

**What counts as "answer-first content"?#**

The page leads with the direct answer to the question implicit in its title. First one or two sentences answer it; context follows; details after that. The inverted-pyramid pattern from journalism. Contrast: the "narrative hook" pattern (scene → tension → answer in paragraph nine) which AI engines tend not to quote from. TL;DR at the top is a strong signal.

[Read the chapter →](/aeo#content-the-answer-capsule-rules)

Related:[What is the 8-dimension AEO score?](#eight-dimension-score)

**Does keyword density matter for AEO?#**

Almost not at all. Keyword density was an SEO signal because Google's 2008-era ranking model was bag-of-words-shaped. AI engines reading your content for citation are doing semantic comprehension; they don't care how many times you said "harbor data API." They care whether your page actually answers the question they're trying to answer. Stuff your page with keywords and you'll dilute the signal, not boost it.

[Read the chapter →](/aeo#content-the-answer-capsule-rules)

Related:[What counts as "answer-first content"?](#answer-first-content)[Mistake: FAQPage schema as a keyword dumping ground](#mistake-faq-keyword-dump)

**Do backlinks help with AEO?#**

Indirectly. Backlinks remain a reputation signal — sites that other sites link to are more likely to be in the training corpus AI engines drew from, and more likely to be in the entity-graph anchoring step. But backlinks are not a citation factor in the way they're a ranking factor; you can have zero backlinks and still get cited if your content is structured for citation. The AEO signal is content quality, not link quantity.

[Read the chapter →](/aeo#foundations-there-is-no-position-3)

Related:[What is the 8-dimension AEO score?](#eight-dimension-score)

**What is the Princeton GEO research?#**

A 2023 paper from Princeton on "Generative Engine Optimization" — empirically measured which content patterns increase citation likelihood across ChatGPT, Perplexity, and similar engines. Key findings: citing statistics with sources, including authoritative quotations, and structured answer-first content all measurably increase citation rates. The research underlies most of the 8-dimension score weights.

[Read the chapter →](/aeo#content-the-answer-capsule-rules)

Related:[What is the 8-dimension AEO score?](#eight-dimension-score)[What counts as "answer-first content"?](#answer-first-content)

**What is the Vercel half-billion GPTBot stat?#**

Vercel published an analysis in 2025 of GPTBot fetch behavior across their network. Half a billion GPTBot requests, zero of them ran JavaScript. This is the empirical proof underlying the "AI engines do not execute JS" claim — not extrapolation, not a sample, the actual full population of GPTBot requests they observed.

[Read the chapter →](/aeo#technical-if-the-bot-cant-reach-you-nothing-else-matters)

Related:[Do AI engines actually run JavaScript when they crawl?](#do-ai-engines-run-javascript)

**What is the Gartner 25% forecast?#**

Gartner's read: by end of 2026, 25% of search will shift from traditional search engines to AI-mediated answers (ChatGPT, Claude, Perplexity, Google AI Overviews). The forecast is what makes AEO a now-problem, not a 2030-problem. The optimization target is moving with the market.

[Read the chapter →](/aeo#foundations-there-is-no-position-3)

Related:[What is AEO and how is it different from SEO?](#what-is-aeo)

## Common mistakes

The canonical pitfall catalog — the failure modes most sites are silently shipping today.

**Mistake: Shipping a JavaScript-only SPA with no SSR#**

The most common AEO failure mode by far. Your React/Vue/Svelte app boots from \`<div id="app"></div>\` and renders in the browser; AI crawlers fetch the HTML response and see exactly that empty div. Title, meta, schema, body — all blank to them. The diagnostic: \`curl -A "GPTBot" yoursite.com/ | grep -c "<p>"\` should be non-zero. If it's zero, every other AEO surface is unreachable.

[Read the chapter →](/aeo#technical-if-the-bot-cant-reach-you-nothing-else-matters)

Related:[Do AI engines actually run JavaScript when they crawl?](#do-ai-engines-run-javascript)

**Mistake: SPAs returning 200 OK on missing routes (soft 404s)#**

Your Express server has \`app.get('\*', sendIndex)\` so the SPA can route on the client. The side effect: every URL — including \`/+c),new URL(c,f)...\` extracted from your JS source by a scanner — returns 200 OK. Crawlers learn to distrust the whole site. Worse, agentsite-served sites end up indexing those junk URLs in their llms.txt. Fix: maintain a real route set on the server, return real 404s for unknown paths.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[Mistake: Hash routing (#/about)](#mistake-hash-routing)

**Mistake: Hash routing (#/about)#**

AI crawlers ignore URL fragments completely. If your routes are \`yoursite.com/#/about\` instead of \`yoursite.com/about\`, the crawler sees one URL: the homepage. Every other route is invisible. Use the History API for routing, never hash routing for production-shipped pages.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[Mistake: SPAs returning 200 OK on missing routes (soft 404s)](#mistake-soft-404)

**Mistake: Empty `<title>` and meta in the SPA shell#**

Your \`index.html\` ships with \`<title></title>\` (or worse, \`<title>App</title>\`) on the assumption that JS will set it later. AI crawlers see the blank title. They use the title to index the page and to decide whether to keep reading. Either ship per-route titles in your initial HTML, or use a content layer (like agentsite) to inject them server-side.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[Mistake: Shipping a JavaScript-only SPA with no SSR](#mistake-js-only-spa)

**Mistake: noindex meta left in the template#**

A \`<meta name="robots" content="noindex">\` left in the SPA shell from a staging environment. Or an \`X-Robots-Tag: noindex\` header at the CDN that nobody remembers configuring. Both kill all AEO silently — the page renders fine to humans, but every AI engine and search engine drops it on sight. Audit periodically: \`curl -I yoursite.com | grep -i robots\` and view-source the index.html.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

**Mistake: Cloudflare Bot Fight Mode blocking AI crawlers#**

Cloudflare's default Bot Fight Mode treats GPTBot, ClaudeBot, and Perplexity as bots-to-block. You think you've done everything right; AI engines never reach your site because they're challenged at the edge before your origin sees them. Allow-list the verified-bot user agents in Cloudflare, or disable Bot Fight Mode entirely for those user-agent classes.

[Read the chapter →](/aeo#three-traps-that-kill-citation-silently)

Related:[Does Cloudflare bot-fight block AI crawlers?](#cloudflare-bot-fight)

**Mistake: JSON-LD schema mismatched to visible content#**

Your FAQPage JSON-LD lists Q&A pairs that don't actually appear on the rendered page. Or your Article JSON-LD claims an author who isn't bylined. Google penalizes mismatch (since 2021). AI engines learn to distrust your schema if mismatch is detected once. Fix: generate schema FROM the rendered content, not in parallel to it. agentsite's render pass guarantees this — schema is extracted from the same DOM the human sees.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What JSON-LD schema types actually drive citations?](#schema-types)[Mistake: Stale \`dateModified\` on Article schema](#mistake-stale-datemodified)

**Mistake: FAQPage schema as a keyword dumping ground#**

Stuffing 30 questions into a FAQPage that no real user would ask, optimized to land on long-tail keywords. AI engines detect this — they were trained on real Q&A patterns and the synthetic ones are obvious. The page gets de-prioritized as a citation source. Genuine FAQ pages with 5–10 real questions perform far better than 30-question keyword farms.

[Read the chapter →](/aeo#content-the-answer-capsule-rules)

Related:[When should I use FAQPage vs Article vs HowTo?](#faqpage-vs-article-vs-howto)[Does keyword density matter for AEO?](#keyword-density)

**Mistake: HowTo schema on a listicle#**

HowTo is for genuine procedures: defined steps, defined outcomes, in order. "10 Best Email Tools" is not a HowTo — it's a listicle, and applying HowTo schema to it is an explicit anti-pattern that AI engines penalize. Use ItemList for listicles, HowTo only for actual procedures (e.g., "How to install Python on macOS").

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[When should I use FAQPage vs Article vs HowTo?](#faqpage-vs-article-vs-howto)

**Mistake: Stale `dateModified` on Article schema#**

Your Article JSON-LD has \`dateModified: "2022-03-14"\` and the article hasn't been touched since. AI engines weight freshness; never-updated content gets de-prioritized regardless of quality. Worse, sites that update content but forget to update dateModified look stale to engines that look stale to engines but fresh to humans. Generate dateModified from the actual content hash on render, not by hand.

[Read the chapter →](/aeo#content-the-answer-capsule-rules)

Related:[Mistake: JSON-LD schema mismatched to visible content](#mistake-schema-content-mismatch)

**Mistake: Inconsistent entity facts across the web#**

Your site says "Founded 2019" but your LinkedIn says 2020 and your Crunchbase says 2018. AI engines try to resolve you as an entity by triangulating across the web. When the facts disagree, you fail to resolve — the engine can't confidently say "this is the same company." Your sameAs anchors are useless if the underlying facts contradict.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[Why does Organization sameAs matter?](#organization-sameas)

**Mistake: Auto-generated llms.txt that's a link dump#**

A naively generated llms.txt that lists every URL on your site without context, descriptions, or grouping. The good ones are \*editorial\*: they tell the AI what your site IS and what it ISN'T (FastHTML's explicit "this is not FastAPI-compatible"). A link dump is worse than no llms.txt — the AI now has a pile of URLs with no map. Curate.

[Read the chapter →](/aeo#the-five-layer-aeo-model)

Related:[What is llms.txt?](#what-is-llms-txt)

Free · No signup · Emailed to your inbox

## Now run it on your site.

Sixty seconds. Paste your URL, see the side-by-side of what humans see vs. what agents see, with the dimension-by-dimension score for why answer engines won't quote you. Or sign up and install the snippet — four lines, five minutes.

[Take the 60-second assessment](/score) [Or sign up →](/auth/sign-up)

Cookies

We use cookies to make this site work and to understand how it's used. [Learn more](https://www.cookiesandyou.com/)

Decline Got it