# What the GEO paper measured The first peer-reviewed controlled experiment of which content tactics move AI citations — what worked, what didn't, and what the per-tactic numbers actually say. By AgentSite · 6 min read · Updated 2026-05-23 The Generative Engine Optimization paper out of Princeton (Aggarwal et al., KDD 2024) is the only peer-reviewed controlled experiment of which content tactics move AI citations. Combined optimization can lift visibility by up to 40%. The per-tactic breakdown is the more useful part — the top three carry most of the lift, one actively hurts, and several do nothing. ## The experiment The authors built **GEO-bench**, a benchmark of 10,000 queries drawn from MS MARCO, ORCAS-1, Natural Questions, AllSouls, LIMA, [Perplexity.ai](http://Perplexity.ai) Discover, ELI5, GPT-4-generated queries, and Davinci-Debate, spread across 25 domains and 9 query types ([Aggarwal et al., 2024, arXiv:2311.09735](https://arxiv.org/abs/2311.09735); [project page](https://generative-engines.com/GEO/)). Eighty percent of the queries are informational; ten percent each transactional and navigational. The generative engine was gpt-3.5-turbo, prompted with the top-5 results from Google search as retrieval sources. Each tactic was tested by applying it to one randomly-selected source for a query, then comparing visibility against the unmodified baseline. The paper also validated the strongest tactics on [Perplexity.ai](http://Perplexity.ai), the deployed retrieval-grounded engine, with consistent results. The setup is a reasonable proxy for production AI crawlers — by late 2024, Vercel measured 569 million GPTBot fetches and 370 million Claude fetches in a single month, none of them executing JavaScript ([Vercel, "The Rise of the AI Crawler," Dec 2024](https://vercel.com/blog/the-rise-of-the-ai-crawler)). Two metrics were used. **Position-Adjusted Word Count (PAWC)** weighted each cited sentence's word share by a decaying exponential of citation position — the headline result number. **Subjective Impression** scored seven dimensions (relevance, influence, uniqueness, diversity, follow-up, position, count) via GPT-3.5 as judge. ## The 9 tactics Nine optimization methods were evaluated, defined as algorithmic source-text transformations: 1. **Authoritative** — rewrite to sound more persuasive and authoritative. 2. **Statistics Addition** — add quantitative statistics in place of qualitative discussion. 3. **Keyword Stuffing** — add more query-relevant keywords (the classical-SEO move). 4. **Cite Sources** — add named external citations to support claims. 5. **Quotation Addition** — add quoted material from credible sources. 6. **Easy-to-Understand** — simplify the language. 7. **Fluency Optimization** — improve readability of the source text. 8. **Unique Words** — add domain-rare vocabulary. 9. **Technical Terms** — add jargon and specialized terminology. ## What worked The paper reports verbatim that "our top-performing methods, Cite Sources, Quotation Addition, and Statistics Addition, achieved a relative improvement of 30-40% on the Position-Adjusted Word Count metric and 15-30% on the Subjective Impression metric." Reading off Table 1 of the paper (PAWC Overall, baseline = 19.3): | Tactic | PAWC Overall | Δ vs baseline | | --- | --- | --- | | Quotation Addition | 27.2 | +41% | | Statistics Addition | 25.4 | +32% | | Fluency Optimization | 24.7 | +28% | | Cite Sources | 24.6 | +27% | | Technical Terms | 22.7 | +18% | | Easy-to-Understand | 22.0 | +14% | | Authoritative | 21.3 | +10% | | Unique Words | 20.5 | +6% | | Keyword Stuffing | 17.7 | **−8%** (only tactic below baseline) | | Baseline (no optimization) | 19.3 | — | The paper's stated conclusion on Keyword Stuffing: "we find such methods offer little to no improvement on generative engine's responses." Translated more sharply: it was the only tactic in the experiment with a measurably negative effect on visibility. The classical-SEO move actively hurts in the generative-engine paradigm. ## The lower-ranked-site finding The most strategically interesting result is in Table 2 of the paper — per-tactic lift broken down by the source's original Google-search rank. Lower-ranked sites benefit dramatically more: - **Cite Sources, Rank-5 source: +115% relative improvement.** A page that was the fifth-ranked Google result for the query saw more than double its baseline citation visibility after the GEO tactic was applied. - **Quotation Addition, Rank-5: +99.7%.** - **Statistics Addition, Rank-5: +97.7%.** For Rank-1 sources, the same tactics show much smaller lifts (and Cite Sources actually showed −30% at Rank-1, suggesting top-ranked pages already saturate their citation potential). The pattern: GEO is _democratizing_. It moves lower-authority pages up disproportionately, while the top-ranked pages have less room to grow. ## Domain-specific patterns Different tactics work in different content categories (Table 3 of the paper): - **Cite Sources** wins in Statement, Facts, and Law & Government queries. - **Statistics Addition** wins in Law & Government, Debate, and Opinion. - **Quotation Addition** wins in People & Society, Explanation, and History. - **Authoritative** wins in Debate, History, and Science. The high-level read is that fact-heavy domains reward citation density (Statistics, Cite Sources), discussion-heavy domains reward voice (Quotation, Authoritative). ## Combinations The paper's §5.3 reports that combining tactics outperforms any single tactic by more than 5.5%. The strongest pair is **Fluency Optimization + Statistics Addition** at roughly 35.8% improvement. Several pairs reach the 30-35% range; few exceed it. There's diminishing return past two-tactic stacks in their data. ## Where this fits For the AEO corpus, the paper is the load-bearing source behind three Layer-4 glossary entries: - [Statistics and citations](/statistics-citations) — the _Statistics Addition_ + _Cite Sources_ tactics combined into one editorial pattern. - [Direct answer](/direct-answer) — the lede-paragraph extraction unit that benefits from all three top tactics applied together. - [Definition density](/definition-density) — the term-level pattern that compounds with the source-citation pattern Cite Sources measures. The longer thesis on why these tactics matter at all — citation is binary, agents don't run JavaScript, the chain dependency of the five layers — is in [agent readability](/agent-readability). The structural map sits in [the five layers of AEO](/five-layer-aeo). The paper's last word — "Generative Engines value not only content but also information presentation" — is the editorial line every Layer 4 piece in this corpus tries to honor.