Question 1

Layer 1 — Render: can the bot read the bytes?

Accepted Answer

A live reader hits your URL. GPTBot, ClaudeBot, PerplexityBot, the ChatGPT user-initiated fetch. None of them execute JavaScript. Vercel measured 569 million GPTBot requests across their network and 370 million from Claude in a single month, and reported that "none of the major AI crawlers currently render JavaScript." (Vercel, "The Rise of the AI Crawler," Dec 2024.) If your site is a Vue / React / Svelte / Angular single-page app — or anything built with Lovable, v0, or Bolt — every one of those agents sees

and leaves. No error. No retry. No log entry on your side that says "GPTBot tried and gave up." You just stop existing for that question. This is the layer that gates everything else. Cite the cleanest framework in the world; if your page renders client-side, the citation engine never sees the framework. Server-rendered HTML on every route is table stakes.

Question 2

Layer 2 — Navigate: can the bot find what's where?

Accepted Answer

Once a bot can read individual pages, the next problem is inventory. Which pages exist? Which one answers the question being asked? sitemap.xml has done this job for search crawlers for two decades. It doesn't quite fit for agents — sitemaps are link dumps; an agent ingesting your site wants a curated overview that fits in a context window. Jeremy Howard proposed /llms.txt in September 2024 as the standard for that overview: "A proposal to standardise on using an /llms.txt file to provide information to help LLMs use a website at inference time." (llmstxt.org.) Short markdown, links into detail at /install.md and /docs.md, no JavaScript in sight. Per-page .md mirrors are the same idea at the route level. Every page exists at both /path and /path.md. Markdown reads better to language models than HTML reads — fewer tokens, less chrome, same content. The two files compose: llms.txt is the index, /path.md is the chapter. If layer 1 is "can the bot read?", layer 2 is "can the bot navigate?". A site with neither file is a maze; with both, it's a manual.

Question 3

Layer 3 — Structure: does the page say what it is?

Accepted Answer

A page can be rendered, indexed, and still be ambiguous to an agent. Is it an article? A FAQ? A how-to? A product page? An organization homepage? The page itself doesn't say. Human visitors infer from layout; agents need it stated. That's what JSON-LD does. Schema.org defines a vocabulary of types — Article, FAQPage, HowTo, Organization, BreadcrumbList, Product — each with required fields. A <script type="application/ld+json"> block at the top of the page declares the type, the author, the publication date, the headline. Agents lift those fields directly when deciding whether to quote the page and how to attribute it. FAQPage is the type that pays off most for AEO because the citation surface is literal: question-as-heading, answer-as-paragraph, both wrapped in schema, both extractable by the agent verbatim. The agent doesn't need to guess what your H2 means. You told it. Layers 1 and 2 say whether the bot can read you. Layer 3 says what kind of thing each page is. A page without it isn't broken; it's just less quotable than the same page with it.

Question 4

Layer 4 — Content: is the page worth quoting?

Accepted Answer

Now the bot can read, navigate, and type your pages. Whether it actually quotes you is content quality. This is the only layer no tool can ship for you. The peer-reviewed work here is the GEO paper out of Princeton. Aggarwal et al., KDD 2024, tested content-optimization tactics in a controlled experiment against generative engines and found "GEO can boost visibility by up to 40% in generative engine responses." (Aggarwal et al., 2024, arXiv:2311.09735.) The strongest individual tactics were adding statistics, citing named sources inline, and including direct quotes. Keyword stuffing was the only tactic with a measured negative effect. Translated to a writing rule: cite named sources with named statistics. Don't pad with keywords. Lead each page with the answer in the first 40-60 words so an agent can lift it verbatim. The page you're reading does the same thing. This layer is human work. AgentSite renders, generates, and validates the technical layers below it. Layer 4 is whatever your writers and engineers actually put on the page.

Question 5

Layer 5 — Measure: who quotes you when?

Accepted Answer

The last layer is external. You don't ship it; you observe it. Are you being mentioned in AI answers? Which engines? For which prompts? Which competitors come up instead? The traffic itself is real. Cloudflare reported in July 2024 that AI bots had accessed roughly 39% of the top one million Internet properties in a single month, with GPTBot alone reaching 35.46% of them. (Cloudflare, "Declaring Your AIndependence," July 2024.) Only 2.98% of the top million were actively blocking AI bots when that report ran — the other 97% were available for citation, in principle. Whether each one was actually getting cited is layer 5. This is what mention-tracking tools measure: send a panel of category prompts to the major engines on a schedule, count how often you come up, who else does. The number that matters is inclusion rate, per engine, sampled often enough to average out the non-determinism of model output. Layer 5 isn't an end state. It's the feedback signal for everything below it. If your inclusion rate is zero, the question is which earlier layer is broken — not which marketing line to A/B test.

Question 6

What is layer 1 of AEO?

Accepted Answer

Layer 1 is whether a bot can read the bytes you serve.

Question 7

What is the purpose of `sitemap.xml`?

Accepted Answer

`sitemap.xml` has done this job for search crawlers for two decades.

Question 8

What does JSON-LD do in relation to page structure?

Accepted Answer

JSON-LD defines a vocabulary of types that helps agents understand what kind of thing each page is.