# Statistics and citations

A statistic without a named source is dropped. A statistic with a named source gets quoted.

By AgentSite · 3 min read · Updated 2026-05-23

Statistics and citations is the AEO dimension that measures whether numerical claims are co-located with the source they came from. A statistic on its own — "76% of teams report X" — is a floating claim. Attached to a named source, it becomes a citable claim. Answer engines quote the second pattern; they drop the first.

## The pattern

The shape that works has three pieces in one sentence: the number, the source by name, and a year or date. Three examples that satisfy the pattern:

-   "According to [Vercel's December 2024 analysis](https://vercel.com/blog/the-rise-of-the-ai-crawler), GPTBot fetched 569 million pages without executing JavaScript on any of them."
-   "[Cloudflare reported in July 2024](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/) that AI bots accessed roughly 39% of the top one million Internet properties in a single month."

Each one names the source, names the number, and names the year. An agent reading either can quote it and attribute correctly without chasing a downstream URL.

## Why it works

The peer-reviewed evidence is the Princeton GEO paper, which tested content-optimization tactics in a controlled experiment against commercial generative engines and reported "up to 40% visibility gains" ([Aggarwal et al., KDD 2024](https://arxiv.org/abs/2311.09735); [project page](https://generative-engines.com/GEO/)). Within that overall lift, the strongest individual tactics were _statistics addition_ and _cite sources_, each delivering a measured +30-40% on its own. Combining them — "named statistic with named source" — is the highest-value content tactic the AEO literature has identified.

The mechanism is straightforward. When an answer engine writes a response, it grounds claims it surfaces against the source text it retrieved. A naked number has no verifier behind it; the engine either drops the claim or rewords it without attribution. A number with a named source has its own verifier inline; the engine can quote the sentence verbatim and attribute the claim to the page that contained it.

## The anti-pattern

"Studies show," "experts say," "research has found," "data suggests." These phrases satisfy a reader scanning for confidence without paying the cost of citation. They do not satisfy an engine looking for a quotable, attributable claim. Replacing every instance of the anti-pattern with the named-source pattern is usually the highest-impact single editing pass available on a page.

The replacement is mechanical. For each statistic on the page, ask: who measured this, when, and where can a reader verify it? If the answer to any of the three is "I don't know," the statistic is either replaceable with one you can attribute, or it should come out.

## Where this fits

Statistics and citations is a Layer 4 dimension — content quality on the individual page. The layers below decide whether an agent can read the page at all; this one decides whether the page is worth quoting once read. The longer treatment is in [the five layers of AEO](/five-layer-aeo); the lede pattern sits in [direct answer](/direct-answer); the thesis is [agent readability](/agent-readability).