Chunk size — agentsite

# Chunk size

How long each paragraph should be so an agent can extract it as a self-contained unit — and why both walls of text and one-line fragments fail.

By AgentSite · 2 min read · Updated 2026-05-23

Chunk size is the AEO dimension that measures whether a page is broken into agent-readable paragraphs of useful length. Engines extract content paragraph-by-paragraph; an undivided wall of text gets compressed into nothing quotable, while many fragmented one-sentence paragraphs lose the connective tissue an agent needs to attribute a claim. The working range is roughly 80 to 150 words per paragraph.

## Why the range

Generative engines extract content paragraph by paragraph. The shape of each paragraph decides whether the quote stands on its own in the engine's response. The Princeton GEO paper measured up to 40% visibility lift from restructuring content into extractable, attribution-ready units ([Aggarwal et al., KDD 2024](https://arxiv.org/abs/2311.09735)).

Below ~80 words a paragraph rarely carries enough context to quote without surrounding sentences. Above ~150 words it spans multiple ideas; the engine summarizes it into something its prompt-budget will accept, and the quotable specificity is gone.

[Schema.org](http://Schema.org)'s [`Article`](https://schema.org/Article) type formalizes the page-level container — `articleBody` holds the chunks; `wordCount` is the page-level aggregate — but it does not prescribe per-chunk length. That decision is the writer's. The 80-150 word range is the practitioner consensus, not the spec.

## Three properties of a well-shaped chunk

1. **Single idea per chunk.** One claim, one argument, one example. An agent that lifts the paragraph should be lifting one thing.
2. **Self-contained.** Removable from the surrounding paragraphs without losing meaning. The "as discussed above" pattern fails this test.
3. **Anchored to a section.** Each chunk sits under a heading that names the question it answers. Floating paragraphs without a heading anchor get assigned to the wrong topic on extraction.

## The two anti-patterns

The wall of text — a 600-word paragraph that covers four ideas in unbroken prose — is the more common failure on technical pages. Each idea is buried; none of them is extractable on its own. The fix is mechanical: break on idea transitions, even when the prose would naturally flow.

The fragmented one-liner — a sequence of 15-word paragraphs each making a separate point — is the more common failure on marketing pages. The agent sees a list of disconnected claims without enough context to lift any of them as a standalone quote. The fix is consolidation: group related sentences into 80-150 word arguments.

The same content with the same words can fail both ways depending on paragraph breaks. The break is the variable.

## Connection to layer 2

The [llms.txt](/llms-txt) convention is the chunk-sized index at the site level — every entry is a one-line description fitting the same "self-contained unit" logic, scaled up to whole pages ([llmstxt.org](https://llmstxt.org/)). Each page is a chunk in the site index; each paragraph is a chunk in the page.

## Where this fits

Chunk size is a Layer 4 dimension — content quality on the individual page. It pairs with the [direct answer](/direct-answer) (the lede chunk that's the page-level extraction unit) and [definition density](/definition-density) (the per-paragraph definitional pattern). The longer thesis is in [agent readability](/agent-readability).