Schema-content mismatch

# Schema-content mismatch

Your JSON-LD says one thing about the page and the visible HTML body says another. Engines that parse both surfaces detect the disagreement and distrust both.

By AgentSite · 4 min read · Updated 2026-05-24

A schema-content mismatch is when your JSON-LD says one thing about the page and the visible HTML body says another — the schema lists an author the body doesn't, claims a date the body contradicts, or describes a product priced differently. Engines that parse both surfaces detect the disagreement and distrust both.

## What it looks like

Three common signatures:

- **Phantom author.** Article schema includes `author: "Jane Doe"` but the page has no visible byline. The agent parsing JSON-LD lifts the author name; the agent reading the HTML can't find her. Mismatch resolved against the page.
- **Stale `dateModified`.** Schema says `dateModified: 2024-12-01` but the visible footer reads "Last updated: March 2023." Engines that compare both flag this as date inflation. See also [content recency](/content-recency).
- **Price disagreement.** Product schema's `price` field doesn't match the price the user sees on the page. This is the most policy-risky variant because Google explicitly forbids it.

The pattern behind all three: build-time schema generators that don't track content changes; CMSs with separate schema editors unsynced from the body; hand-rolled JSON-LD blocks copied across pages and forgotten.

## How to detect it

Three checks per page:

1. **Curl the page; extract the JSON-LD block.** Grep for `application/ld+json` and pipe through `jq` for parsing.
2. **Compare schema fields against the visible body.** Specifically: `author`, `datePublished`, `dateModified`, `headline`, `price` (if Product), `Question.name` (if FAQPage).
3. **Use Google's Rich Results Test** for an authoritative cross-check on individual URLs.

Most teams discover the mismatch when Search Console reports a "structured data manual action" — which means the discovery is late and the rich result has already been pulled.

## Why it costs you citations

Google's quality guidelines name this rule directly. The content requirement: "Don't mark up content that is not visible to readers of the page. For example, if the JSON-LD markup describes a performer, the HTML body must describe that same performer" ([Google, structured-data policies](https://developers.google.com/search/docs/appearance/structured-data/sd-policies)). The consequence: "A structured data manual action means that a page loses eligibility for appearance as a rich result."

For AEO the harm is broader than rich results. AI crawlers fetch every page at scale (Vercel measured 569 million GPTBot fetches in a single month — [Vercel, Dec 2024](https://vercel.com/blog/the-rise-of-the-ai-crawler)) and most major engines parse JSON-LD when extracting:

- A mismatch makes both surfaces less trustworthy. The engine that lifted "Jane Doe" from schema and "no byline" from the body has to pick one or drop both.
- Some engines drop the entire page from citation eligibility when the markup reads as deceptive. Generic schema with mismatched fields reads as low-quality SEO chrome.
- The mismatch hurts even pages that pass every other gate. Layer 4 content quality can be perfect; if Layer 3 says something the body doesn't, the page loses anyway.

The [schema.org](http://schema.org) [`Article`](https://schema.org/Article) type defines `author`, `datePublished`, `dateModified`, and `headline` as first-class properties — each one a potential mismatch surface. Same for `Product.price`, `FAQPage.mainEntity[].Question.name`, and so on. The more properties you populate, the more mismatch surfaces you create if the schema isn't derived from the body.

## How to fix it

The architectural fix is single source of truth: derive schema from page content, not from a separate config or hand-maintained block. Two working patterns:

1. **Build-time generation from rendered HTML.** Run a script during build that reads the page DOM and emits the JSON-LD. The schema can't drift because it's derived from what the user actually sees.
2. **CMS-side coupling.** When the CMS author edits a byline, headline, or date, the schema regenerates from the same edit. Avoid CMSs that store schema in a separate editor field unsynced from the body.

Periodic audit catches drift even when the architecture is right: a monthly script that pulls 20 random pages, extracts their JSON-LD, and asserts each schema field against the visible body. Most violations are stale fields nobody noticed.

## Where AgentSite fits

The render service extracts schema from the rendered HTML rather than from a separate config — so the JSON-LD we emit is always derived from the same page content the user sees. We carry through any customer-emitted JSON-LD verbatim (we never overwrite a hand-authored block) and add detected types (Article, FAQPage, HowTo, Organization, BreadcrumbList) only when they're missing. We also reject schemas that fail an anti-pattern check (stale `dateModified` against the current rendered date; keyword-stuffed headlines) before publish. Sites that hand-author their JSON-LD still need the architectural fix above.

## Related problems

- [FAQ schema](/faq-schema) — the schema type most prone to content-mismatch issues (hidden or accordion-collapsed Q&A).
- [Content recency](/content-recency) — the date-mismatch subset of this problem.
- [The five layers of AEO](/five-layer-aeo) — the structural map this problem sits in (Layer 3).
- The full catalog: [AEO problems](/aeo-problems).