What the GEO paper measured

# What the GEO paper measured

The first peer-reviewed controlled experiment of which content tactics move AI citations — what worked, what didn't, and what the per-tactic numbers actually say.

By AgentSite · 6 min read · Updated 2026-05-23

The Generative Engine Optimization paper out of Princeton (Aggarwal et al., KDD 2024) is the only peer-reviewed controlled experiment of which content tactics move AI citations. Combined optimization can lift visibility by up to 40%. The per-tactic breakdown is the more useful part — the top three carry most of the lift, one actively hurts, and several do nothing.

## The experiment

The authors built **GEO-bench**, a benchmark of 10,000 queries drawn from MS MARCO, ORCAS-1, Natural Questions, AllSouls, LIMA, [Perplexity.ai](http://Perplexity.ai) Discover, ELI5, GPT-4-generated queries, and Davinci-Debate, spread across 25 domains and 9 query types ([Aggarwal et al., 2024, arXiv:2311.09735](https://arxiv.org/abs/2311.09735); [project page](https://generative-engines.com/GEO/)). Eighty percent of the queries are informational; ten percent each transactional and navigational.

The generative engine was gpt-3.5-turbo, prompted with the top-5 results from Google search as retrieval sources. Each tactic was tested by applying it to one randomly-selected source for a query, then comparing visibility against the unmodified baseline. The paper also validated the strongest tactics on [Perplexity.ai](http://Perplexity.ai), the deployed retrieval-grounded engine, with consistent results. The setup is a reasonable proxy for production AI crawlers — by late 2024, Vercel measured 569 million GPTBot fetches and 370 million Claude fetches in a single month, none of them executing JavaScript ([Vercel, "The Rise of the AI Crawler," Dec 2024](https://vercel.com/blog/the-rise-of-the-ai-crawler)).

Two metrics were used. **Position-Adjusted Word Count (PAWC)** weighted each cited sentence's word share by a decaying exponential of citation position — the headline result number. **Subjective Impression** scored seven dimensions (relevance, influence, uniqueness, diversity, follow-up, position, count) via GPT-3.5 as judge.

## The 9 tactics

Nine optimization methods were evaluated, defined as algorithmic source-text transformations:

1.  **Authoritative** — rewrite to sound more persuasive and authoritative.
2.  **Statistics Addition** — add quantitative statistics in place of qualitative discussion.
3.  **Keyword Stuffing** — add more query-relevant keywords (the classical-SEO move).
4.  **Cite Sources** — add named external citations to support claims.
5.  **Quotation Addition** — add quoted material from credible sources.
6.  **Easy-to-Understand** — simplify the language.
7.  **Fluency Optimization** — improve readability of the source text.
8.  **Unique Words** — add domain-rare vocabulary.
9.  **Technical Terms** — add jargon and specialized terminology.

## What worked

The paper reports verbatim that "our top-performing methods, Cite Sources, Quotation Addition, and Statistics Addition, achieved a relative improvement of 30-40% on the Position-Adjusted Word Count metric and 15-30% on the Subjective Impression metric."

Reading off Table 1 of the paper (PAWC Overall, baseline = 19.3):

| Tactic | PAWC Overall | Δ vs baseline |
| --- | --- | --- |
| Quotation Addition | 27.2 | +41% |
| Statistics Addition | 25.4 | +32% |
| Fluency Optimization | 24.7 | +28% |
| Cite Sources | 24.6 | +27% |
| Technical Terms | 22.7 | +18% |
| Easy-to-Understand | 22.0 | +14% |
| Authoritative | 21.3 | +10% |
| Unique Words | 20.5 | +6% |
| Keyword Stuffing | 17.7 | **−8%** (only tactic below baseline) |
| Baseline (no optimization) | 19.3 | — |

The paper's stated conclusion on Keyword Stuffing: "we find such methods offer little to no improvement on generative engine's responses." Translated more sharply: it was the only tactic in the experiment with a measurably negative effect on visibility. The classical-SEO move actively hurts in the generative-engine paradigm.

## The lower-ranked-site finding

The most strategically interesting result is in Table 2 of the paper — per-tactic lift broken down by the source's original Google-search rank. Lower-ranked sites benefit dramatically more:

-   **Cite Sources, Rank-5 source: +115% relative improvement.** A page that was the fifth-ranked Google result for the query saw more than double its baseline citation visibility after the GEO tactic was applied.
-   **Quotation Addition, Rank-5: +99.7%.**
-   **Statistics Addition, Rank-5: +97.7%.**

For Rank-1 sources, the same tactics show much smaller lifts (and Cite Sources actually showed −30% at Rank-1, suggesting top-ranked pages already saturate their citation potential). The pattern: GEO is _democratizing_. It moves lower-authority pages up disproportionately, while the top-ranked pages have less room to grow.

## Domain-specific patterns

Different tactics work in different content categories (Table 3 of the paper):

-   **Cite Sources** wins in Statement, Facts, and Law & Government queries.
-   **Statistics Addition** wins in Law & Government, Debate, and Opinion.
-   **Quotation Addition** wins in People & Society, Explanation, and History.
-   **Authoritative** wins in Debate, History, and Science.

The high-level read is that fact-heavy domains reward citation density (Statistics, Cite Sources), discussion-heavy domains reward voice (Quotation, Authoritative).

## Combinations

The paper's §5.3 reports that combining tactics outperforms any single tactic by more than 5.5%. The strongest pair is **Fluency Optimization + Statistics Addition** at roughly 35.8% improvement. Several pairs reach the 30-35% range; few exceed it. There's diminishing return past two-tactic stacks in their data.

## Where this fits

For the AEO corpus, the paper is the load-bearing source behind three Layer-4 glossary entries:

-   [Statistics and citations](/statistics-citations) — the _Statistics Addition_ + _Cite Sources_ tactics combined into one editorial pattern.
-   [Direct answer](/direct-answer) — the lede-paragraph extraction unit that benefits from all three top tactics applied together.
-   [Definition density](/definition-density) — the term-level pattern that compounds with the source-citation pattern Cite Sources measures.

The longer thesis on why these tactics matter at all — citation is binary, agents don't run JavaScript, the chain dependency of the five layers — is in [agent readability](/agent-readability). The structural map sits in [the five layers of AEO](/five-layer-aeo).

The paper's last word — "Generative Engines value not only content but also information presentation" — is the editorial line every Layer 4 piece in this corpus tries to honor.