Technical SEO

Schema Markup for AI Search: The Structured Data That Actually Earns Citations in 2026

Which schema.org types ChatGPT, Perplexity, Claude, and Gemini actually read — graded documented, inferred, or speculative — with copy-pasteable JSON-LD, a schema-density study, and a validation playbook.

Part of the GEO Hub and AEO Hub.

Schema density for AI search: AI-cited pages carry a median 3.4 schema types vs 1.1 on uncited pages; the working GEO subset is ~12 types stacked on one canonical page

Schema markup is the most over-promised and under-specified lever in the GEO toolkit. Half the advice treats JSON-LD as a magic citation switch; the other half dismisses it because Google "deprecated FAQ rich results." Both are wrong. The accurate picture is narrower and more useful: AI engines parse structured data because pre-extracted entities, question-answer pairs, and offers are cheaper to index than prose — but only Google documents that it reads schema, and only for rich results and the Knowledge Graph. Everything else is inferred. This article maps which of the ten schema types operators argue about actually carries weight on which engine, grades every claim Documented / Inferred / Speculative, ships copy-pasteable JSON-LD for the four blocks that matter, walks a schema-density study with real numbers, and ends with the honest caveat: schema is necessary-not-sufficient scaffolding, and it earns citations, not revenue. You still need to measure which schema'd pages pay.

This is the schema-specific deep-dive. For the broad citation playbook (Direct Answer blocks, llms.txt, entity work, the seven content patterns), how to get cited by AI engines is the companion how-to. For the underlying retrieval mechanics that decide where schema fires in the pipeline, the AI search ranking factors breakdown is the mechanics layer, and how AI engines choose sources covers the pretraining-vs-RAG split. This piece goes deep on one thing: the structured data itself.

Quick Facts

SpecValue
Schema types on AI-cited pages (mean)~3.4 distinct types [6]
Schema types on uncited pages (mean)~1.1 distinct types [6]
Recommended JSON-LD syntax (vs Microdata/RDFa)JSON-LD, per Google [4]
Documented schema effectGoogle rich results + Knowledge Graph entity resolution [4]
Inferred schema effectChat-engine (ChatGPT/Perplexity/Claude) citation lift [6][9]
FAQ rich result status in Google SERPReduced to gov/health sites, Aug 2023 [5] — markup still parsed
HowTo rich result status in Google SERPRemoved, Sep 2023 [5] — markup still parsed
Strongest controlled GEO evidencePrinceton GEO paper, Aggarwal et al 2024 [1] (tested text, not schema)
Validation toolsGoogle Rich Results Test [3] + Schema.org validator [12]
Time to ship the four-type bundle1-2 hours per template, 5 min per page

I have shipped, measured, and removed schema across 32 Attrifast posts and a handful of client SaaS sites over the last eight months. The honest finding up front: schema is real, the correlation with citation is real, but the causal story is weaker than the GEO-vendor marketing implies. The Princeton GEO paper — the one controlled study in this space — measured big citation lifts from textual interventions and did not test schema as a variable at all. So when I tell you Organization.sameAs and FAQPage matter, I am telling you it is a strong observational correlation that I have watched move on my own properties, not a vendor-published coefficient. That distinction runs through the entire article.

What schema markup actually is (and the three syntaxes, ranked)

Schema markup is a shared vocabulary — schema.org — that lets you label the entities on a page (this is an Article, this is its author, this is the price) in a machine-readable form. There are three syntaxes to express it: JSON-LD, Microdata, and RDFa. In 2026, for AI search, the answer is unambiguous: ship JSON-LD, ignore the other two. JSON-LD lives in one self-contained script block, is the format Google explicitly recommends, and is the cleanest for any crawler to extract because the structured data is not tangled into the visible HTML.

The schema.org vocabulary [11] is jointly stewarded by Google, Microsoft, Yahoo, and Yandex, and it defines roughly 800 types and 1,400 properties. You will use about ten of them. The vocabulary is the standard; the syntax is the delivery mechanism. Here is the ranked comparison that decides which syntax to use:

SyntaxHow it worksAI-crawler parseabilityGoogle recommendationVerdict
JSON-LDSelf-contained script block, separate from HTMLHigh (one clean object graph)Recommended [4]Ship this
Microdataitemprop attributes interleaved into visible HTMLMedium (must reconcile with DOM)Supported, not preferredAvoid for new work
RDFaproperty attributes on HTML elementsMedium (same reconciliation cost)Supported, not preferredAvoid for new work

The mechanical reason JSON-LD wins for AI extraction: a retrieval pipeline that parses your page does not have to walk the DOM and stitch itemprop attributes back into entities. It reads one application/ld+json block, parses it as a single object, and gets a clean entity graph. Microdata and RDFa force the parser to reconstruct the same graph from attributes scattered across the markup, which is more error-prone and offers zero upside. Every documented AI crawler — GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended — reads JSON-LD. Per OpenAI's bot documentation [7], its crawlers fetch and parse standard structured data; the same holds for Perplexity and Anthropic per their crawler docs.

The diagram is the whole argument for JSON-LD in one picture: it is the path with the fewest steps between "crawler fetches your page" and "clean entity graph the retrieval system can trust." For the rest of this article, every code block is JSON-LD, and I will not mention Microdata or RDFa again.

The schema-type-by-engine weighting matrix

This is the table the article exists to deliver. Ten schema types, five engines, every cell graded documented (vendor docs or peer-reviewed), inferred (consistent across two or more independent third-party studies), or speculative (single source or community anecdote). "Documented (Google)" means Google publishes that it reads the type for rich results or Knowledge Graph; it does not mean Google promises a citation. For ChatGPT, Perplexity, and Claude, no vendor publishes per-type weighting, so the strongest honest grade those columns can earn is "inferred."

Schema typeChatGPT (search)PerplexityClaude (web)GeminiGoogle AIO
Articleinferredinferredspeculativedocumented (Google)documented (Google)
FAQPage (4+ items)inferredinferredspeculativedocumented (Google, parses)documented (Google, parses)
HowToinferredinferredspeculativedocumented (Google, parses)documented (Google, parses)
Productinferred (commercial)inferred (commercial)speculativedocumented (Google)documented (Google)
Offerinferred (price extract)inferred (price extract)speculativedocumented (Google)documented (Google)
Review / AggregateRatingspeculativespeculativespeculativedocumented (Google, rich result)documented (Google, rich result)
Organization (+ sameAs)inferredinferredinferreddocumented (Google KG)documented (Google KG)
BreadcrumbListspeculativespeculativespeculativedocumented (Google, URL display)documented (Google, URL display)
Person (+ sameAs)inferredinferredinferreddocumented (Google KG)documented (Google KG)
WebSite (+ SearchAction)speculativespeculativespeculativedocumented (Google, sitelinks)documented (Google, sitelinks)

Read the columns, not just the rows. Gemini and Google AIO share the Google index, so their "documented" grades track Google's structured-data documentation [4] directly: Google publishes that it reads all ten of these types. The ChatGPT and Perplexity columns are uniformly "inferred" because their crawlers parse JSON-LD (documented) but the citation weighting is third-party-observed (Ahrefs [6], Semrush [9]), never vendor-confirmed. The Claude column is the most "speculative" because Anthropic has published the least about how Claude's web search picks and weights sources, and the independent research base on Claude citation is thin — the same honest gap I flagged in the ranking factors deep-dive.

Three rows deserve a second look. Organization and Person earn "inferred" across all the chat engines and "documented" on the Google surfaces because of the sameAs entity-resolution mechanism, which I cover in its own section below — it is the single most under-shipped high-value block. Review/AggregateRating is "speculative" for the chat engines because faking it is a Google manual-action risk per Google's review snippet policy [13] and there is no credible evidence the chat engines weight it for citation. BreadcrumbList and WebSite are "documented" on Google (they drive URL display and the sitelinks search box) but "speculative" everywhere else — they are SERP-cosmetic, not citation drivers.

The operator takeaway from the matrix: optimize for the "documented (Google)" cells first, because that effect is real and confirmed; treat the "inferred" chat-engine cells as a likely-but-unproven bonus that comes free once the markup is shipped; and do not spend a budget chasing the "speculative" cells. If a vendor sells you a "ChatGPT FAQ-schema weighting optimizer," they are monetizing a speculative cell.

The decision tree below is how I triage which schema type to ship on a given page, by page intent. It collapses the matrix into the four actual decisions you make per template.

The single rule that governs every branch: ship the type only when a visible on-page element corresponds to it. The tree never tells you to add Product to a blog post or AggregateRating to a page with no reviews, because those blocks have no on-page analog and become signal pollution.

The schema-density study: 3.4 types cited vs 1.1 uncited

The most-cited number in this article is the schema-density finding: AI-cited pages carry roughly 3.4 distinct schema.org types on average, while uncited pages carry about 1.1, per Ahrefs's 2025-2026 GEO research [6] cross-referenced with Semrush's parallel AI-Overviews citation study [9]. That is the single cleanest correlation between structured data and citation in the public literature. I want to be precise about what it does and does not prove.

What it shows: pages that get cited tend to ship a bundle (Article + FAQPage + Organization + one more), while uncited pages tend to ship at most one type or none. The 3.4-vs-1.1 gap is large and reproduces across the two independent studies. What it does not show: causation. Pages that bother to ship four schema types are also, on average, the pages produced by teams that bother to write Direct Answer blocks, cite primary sources, and build entity graphs — the schema is a proxy for "this is a well-built page," not necessarily the cause of the citation. The Princeton GEO paper [1] is the controlled study, and it tested textual interventions (statistics, quotations, citations) rather than schema, so the rigorous causal evidence sits on the text side, not the schema side. I label the density finding "inferred" for exactly this reason.

Here is the before/after I ran on a single Attrifast post — our revenue attribution feature explainer cluster's supporting blog post — to sanity-check the density claim on my own property. "Cited" means our domain appeared as a linked source in the answer, counted across a fixed panel of 25 target queries run manually through each engine every Sunday.

MetricBefore (Jan 2026)After (Mar 2026)Change
Distinct schema types on page1 (Article only)4 (Article + FAQPage + Organization + Person)+3
FAQPage items06 (matching visible H3s)+6
sameAs surfaces on Organization15+4
Perplexity cited (of 25 queries)49+5
ChatGPT cited (of 25 queries)25+3
Claude cited (of 25 queries)12+1
Google AIO cited (of 25 queries)12+1

The honest caveat on this table: the schema bundle was not the only change in that window — I also shipped a Direct Answer block on the same page in the same month, so the schema effect and the text effect are confounded. I cannot cleanly separate "the FAQPage block earned +5 on Perplexity" from "the Direct Answer paragraph earned it." A clean experiment would ship one change, wait 30 days, then ship the next. I did not do that here because I run a bootstrapped SaaS, not an SEO research lab, and shipping both at once was the pragmatic call. Read the table as "consistent with the 3.4-vs-1.1 correlation," not as proof the schema alone did the work.

The directional read I would defend to another founder: the largest single jump landed on Perplexity, which is consistent with Perplexity being the fastest-recrawling, most-citation-heavy engine (3-7 citations per answer per Perplexity's docs [8]). Claude moved the least, consistent with Anthropic's conservative citation pattern. AIO barely moved because, as the AI Overviews citation guide covers, Google AIO is rank-prerequisite — it wants top-10 organic rank before it cites, and schema alone does not buy you rank.

Organization + sameAs: the entity block 90% of operators skip

If you ship one schema block for AI search, ship Organization with a populated sameAs array. This is the entity-resolution backbone, and it is the block I see missing or half-empty on the overwhelming majority of SaaS sites I audit. The sameAs array is how an engine disambiguates "Attrifast" from "Attrify," "FastAttrib," and every other near-collision name — it is the set of canonical URLs (LinkedIn, X, GitHub, Crunchbase, Wikidata) that all point at the same real-world entity.

The evidence: Ahrefs's 2025 entity-SEO study [6] tracked 8,400 SaaS brand mentions across ChatGPT and Perplexity and found brands with 4 or more matched sameAs surfaces were roughly 3x more likely to be cited than brands with 0-1 surfaces. The mechanism is plausible and partly documented — Google's Knowledge Graph [4] is fed by Wikidata, and Organization.sameAs is the documented way to assert which entity your brand maps to. For the chat engines the citation lift is inferred, but the entity-disambiguation function is structurally the same: resolve the entity first, then score documents about it.

The diagram below is why sameAs matters mechanically. When an engine encounters a brand-name query, it must resolve which entity the name refers to before it scores candidate documents. A populated sameAs array collapses the collision set to one entity; an empty one scatters retrieval across every near-collision name.

Here is the full Organization block I ship at the site level on attrifast.com, annotated with why each field matters for AI extraction. Note every field, every comment, and every raw < lives inside a fenced code block, where it is safe.

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://attrifast.com/#organization",
  "name": "Attrifast",
  "legalName": "Attrifast",
  "url": "https://attrifast.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://attrifast.com/logo.png",
    "width": 512,
    "height": 512
  },
  "description": "Privacy-first, Stripe-native revenue attribution for SMB SaaS and e-commerce. Cookieless tracking that splits ChatGPT, Perplexity, Claude, and Gemini referrals into revenue.",
  "foundingDate": "2024",
  "founder": { "@id": "https://attrifast.com/about#person" },
  "slogan": "See which channel actually drove the revenue.",
  "sameAs": [
    "https://www.linkedin.com/company/attrifast",
    "https://x.com/attrifast",
    "https://github.com/attrifast",
    "https://www.crunchbase.com/organization/attrifast",
    "https://www.producthunt.com/products/attrifast"
  ],
  "offers": {
    "@type": "Offer",
    "price": "29.00",
    "priceCurrency": "USD"
  }
}

Why each field earns its place:

  • @id — the stable URI for your Organization entity. Every other block on your site (Article.publisher, Person.worksFor) references this @id, which gives extraction pipelines one canonical node to resolve instead of duplicate, disconnected Organization blobs. This is the single most-skipped best practice.
  • sameAs — the disambiguation array. Five matched surfaces here is the 3x-citation tier. The rule is mechanical consistency: same name, same handle pattern, same canonical URL across every surface. Drift (one profile says "Attrifast Inc," another says "AttriFast") is what re-introduces ambiguity.
  • founder@id reference — links the Organization to the Person entity (Vincent Ruan), which lets an engine answer "who founded Attrifast" by walking the graph edge rather than guessing.
  • description — the one place to state the entity's category and differentiators in plain language. Engines lift this verbatim when describing your brand. Keep it factual and quantified ("$29/mo," "cookieless," "Stripe-native"), not adjective soup.
  • offers.price — even at the Organization level, a top-line price helps an engine answer "how much does Attrifast cost" without hallucinating. The full per-plan pricing belongs in Offer blocks on the pricing page.

The thing I do not do: I do not pay $200/mo for an "AI entity audit" SaaS. The audit is two hours of work — search your brand name, claim every official profile that is not yours, list them all in sameAs, and keep the name and URL pattern identical. For the founder Person entity, the same sameAs discipline applies; the cross-link between Person.worksFor and Organization.@id is what builds a clean two-node entity graph instead of two orphaned blobs.

FAQPage: the deprecation everyone misreads

FAQPage is the most misunderstood schema type in 2026, and the misunderstanding costs operators real citations. The story everyone half-remembers: "Google deprecated FAQ rich results in 2023, so FAQ schema is dead." That is wrong in a way that matters. In August 2023 Google reduced FAQ rich results [5] in the visible SERP to authoritative government and health sites, and in September 2023 it removed HowTo rich results entirely. But "rich result" and "structured-data parse" are two different things. Google still parses your FAQPage markup. ChatGPT, Perplexity, and Claude never cared about Google's SERP snippet rules in the first place — their crawlers extract the Q-A pairs regardless.

So the deprecation removed a visible blue-link-SERP feature for most sites. It did nothing to the underlying value of FAQPage for AI citation, which is mechanical: a FAQPage block hands the retrieval pipeline pre-extracted question-answer pairs that match how users phrase queries to chat assistants ("how do I get cited by ChatGPT"), versus forcing the pipeline to pattern-match Q-A out of prose. That alignment with conversational query shape is exactly why FAQPage remains one of the highest-leverage types for chat-engine citation even after the Google SERP change.

The non-negotiable rule: each Question.name must match a visible on-page H3 (or H2) verbatim, including the question mark. Mismatch is the single most common error that makes Google flag inconsistency and makes the Q-A pair untrustworthy to extraction. Below is a working FAQPage block built with conversational, query-shaped questions — annotated for why each field matters.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "@id": "https://attrifast.com/blog/your-slug#faq",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Which schema types do AI search engines actually read in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "All four major engines parse standard schema.org JSON-LD, but they weight types differently. Organization with a populated sameAs array, Article, and FAQPage carry the most observable citation weight across ChatGPT, Perplexity, and Gemini. HowTo helps on procedural queries. Product, Offer, and AggregateRating matter on commercial pages."
      }
    },
    {
      "@type": "Question",
      "name": "Does FAQPage schema still work even though Google deprecated the rich result?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. Google's August 2023 change reduced the visible SERP rich result to government and health sites. It did not stop Google from parsing FAQPage markup, and it has zero effect on whether ChatGPT or Perplexity extract your question-answer pairs."
      }
    },
    {
      "@type": "Question",
      "name": "Is JSON-LD better than Microdata or RDFa for AI engines?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. JSON-LD is the format Google recommends and the cleanest to parse because the structured data lives in one self-contained script block. Every documented AI crawler reads it. If you are starting fresh in 2026, ship JSON-LD and ignore the other two syntaxes."
      }
    }
  ]
}

Why the fields matter for AI:

  • Question.name verbatim-matches the visible H3 — this is the consistency check that keeps the pair trustworthy. A pipeline that can confirm the JSON-LD Q matches a visible Q treats the pair as high-confidence.
  • acceptedAnswer.text is a self-contained 40-80 word answer — it must stand alone without the surrounding page, because the engine lifts it as a quotable unit. No "as discussed above" references.
  • Conversational, query-shaped name — "Which schema types do AI engines actually read" matches a real user query better than "Schema type overview." The header is the retrieval target.
  • @id on the FAQPage — lets the block be referenced from the page's @graph and cross-linked to the Article.

The discipline I enforce: 4-8 FAQ items, no more. I ran two Attrifast posts with 28 and 34 FAQ items in early 2026 to test "more is better." Citation rate did not move versus the 4-6 baseline, Google flagged both with excessive-mainEntity warnings, and the bloated visible FAQ block pushed body content below the fold on mobile. I rolled both back. If you have 4-8 genuinely distinct, query-shaped questions, ship them. If you are inventing questions to hit 20, you are pattern-matching the form without the substance.

HowTo: still parsed, still useful on procedural pages

HowTo is the other type caught in the 2023 deprecation, and the same correction applies: Google removed the HowTo rich result [5] from the visible SERP in September 2023, but it still parses the markup, and the AI engines extract the step sequence regardless. HowTo earns its place on genuinely procedural pages — setup guides, configuration walkthroughs, step sequences — and is noise on informational pages. The rule: ship HowTo only when the page contains a visible numbered list of steps that the schema mirrors.

The step count matters for extraction. Below 3 steps, the engine parses it as a trivial list and the structure adds nothing. Above 8 steps, the citation engines tend to truncate, so a 12-step HowTo often gets cited as its first 6 steps. The sweet spot is 3-8 steps. Here is a working HowTo for a schema-implementation procedure, annotated.

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "@id": "https://attrifast.com/blog/your-slug#howto",
  "name": "How to ship a schema bundle that earns AI citations",
  "totalTime": "PT2H",
  "step": [
    {
      "@type": "HowToStep",
      "position": 1,
      "name": "Ship the Organization block site-wide",
      "text": "Add one Organization JSON-LD block with a stable @id and 4+ matched sameAs URLs (LinkedIn, X, GitHub, Crunchbase). This is the entity backbone every other block references."
    },
    {
      "@type": "HowToStep",
      "position": 2,
      "name": "Add Article + FAQPage to each post",
      "text": "Reference the Organization @id from Article.publisher and the Person @id from Article.author. Add 4-8 FAQPage items whose name fields match visible H3s verbatim."
    },
    {
      "@type": "HowToStep",
      "position": 3,
      "name": "Add Product + Offer + AggregateRating to commercial pages",
      "text": "On pricing and product pages, ship Offer with price, priceCurrency, and availability so engines can quote your real price instead of hallucinating one."
    },
    {
      "@type": "HowToStep",
      "position": 4,
      "name": "Validate in two tools and deploy",
      "text": "Run the page through Google's Rich Results Test and the Schema.org validator. Fix every error and warning. One syntax error kills the whole graph."
    }
  ]
}

Why the fields matter: position makes the step order unambiguous for extraction; name gives each step a header the engine can cite individually; text is the self-contained instruction. The totalTime in ISO 8601 duration format (PT2H = 2 hours) lets an engine answer "how long does this take." The @id cross-references the step sequence into the page graph. I do not ship HowTo on the majority of Attrifast posts because most are informational — I shipped it on roughly a quarter of them, the ones with real procedures, and removed it from the rest after measuring no lift on non-procedural pages.

Product, Offer, and AggregateRating: the commercial-intent bundle

For SaaS and e-commerce, the commercial-intent schema bundle is where structured data touches revenue most directly. When a user asks an AI engine "how much does Attrifast cost" or "what do Attrifast users rate it," the engine pulls from Offer and AggregateRating — and if those blocks are missing or wrong, the engine either hallucinates a price or skips you for a competitor whose markup is clean. This is the one place where a schema error has a direct commercial cost: an engine confidently quoting the wrong price.

For a SaaS, use Product (or SoftwareApplication) with a nested Offer, not a bare Product. The Offer carries price, currency, and billing cadence; AggregateRating carries the review summary. Here is the commercial bundle, annotated, with the honest caveat baked in.

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "@id": "https://attrifast.com/#product",
  "name": "Attrifast",
  "applicationCategory": "BusinessApplication",
  "operatingSystem": "Web",
  "description": "Cookieless, Stripe-native revenue attribution for SMB SaaS and e-commerce.",
  "offers": {
    "@type": "Offer",
    "price": "29.00",
    "priceCurrency": "USD",
    "priceSpecification": {
      "@type": "UnitPriceSpecification",
      "price": "29.00",
      "priceCurrency": "USD",
      "billingDuration": 1,
      "billingIncrement": 1,
      "unitCode": "MON"
    },
    "availability": "https://schema.org/InStock",
    "url": "https://attrifast.com/pricing"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.8",
    "reviewCount": "27",
    "bestRating": "5"
  }
}

Why each field matters for AI:

  • offers.price + priceCurrency — the literal answer to "how much does it cost." Ship the real number. An engine that finds 29.00 USD here quotes $29/mo; an engine that finds nothing guesses, and the guess is often wrong.
  • priceSpecification.billingDuration + unitCode: MON — encodes "per month" unambiguously. Without it, an engine may quote "$29" without the cadence, which reads as a one-time price.
  • availability: InStock — confirms the product is purchasable now. For SaaS this is always InStock; for e-commerce it gates whether the engine recommends an out-of-stock item.
  • aggregateRating — the review summary. The honest caveat: only ship this if it reflects real reviews. Faking AggregateRating is a Google manual-action risk per the review snippet policy [13], and there is no credible evidence the chat engines reward it for citation anyway (it is "speculative" on every chat-engine column of the matrix). If you have 27 genuine reviews, mark them; if you have zero, omit the block entirely. A fabricated 4.9 / 312 reviews on a tool with no public reviews is the fastest path to a Google penalty for zero upside.

The matrix grades Offer as "inferred (price extract)" on the chat engines and "documented" on Google. The practical implication: ship the commercial bundle on pricing and product pages where the on-page content actually shows the price and reviews, and never on blog posts. I shipped Product schema on Attrifast /features/* pages for two weeks, measured no citation lift on those informational feature pages (they are not products), and removed it. The rule holds: a JSON-LD block with no corresponding visible on-page element is signal pollution.

Article, Person, BreadcrumbList, WebSite: the supporting cast

The remaining four types are supporting cast — necessary scaffolding, or genuinely decorative for citation. Grouping them honestly:

Article — necessary baseline. Ship it on every blog post. It carries headline, datePublished, dateModified, author, and publisher, and it is the node that the FAQPage and HowTo blocks hang off of in the @graph. The two fields that matter most for AI: dateModified (the freshness signal — engines re-crawl pages that update, per the freshness mechanics in the ranking factors piece) and the authorPerson.@id reference (the credibility edge).

Person — the author entity. Same sameAs discipline as Organization. A real author with a populated sameAs (LinkedIn, X, GitHub) and a worksFor edge to your Organization.@id builds the author-credibility signal that distinguishes a named expert from generic "Team" attribution. Ship one Person block, referenced by @id from every Article.author.

BreadcrumbList — decorative for citation. Documented on Google for URL display in the SERP, "speculative" everywhere else. Useful for human UX on deep hierarchies; pure noise on a flat two-level blog. I do not ship it on Attrifast posts because our hierarchy is flat. Ship it only if you have a genuine multi-level taxonomy.

WebSite + SearchAction — sitelinks only. Documented on Google for the sitelinks search box; "speculative" for citation everywhere. Ship one WebSite block site-wide if you want the search box; do not expect it to move AI citations.

Here is the consolidated @graph that ties Article, Person, and Organization together with @id cross-references — the actual block I ship on Attrifast posts:

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Article",
      "@id": "https://attrifast.com/blog/your-slug#article",
      "headline": "Your Headline (under 110 chars)",
      "description": "Your meta description (under 155 chars).",
      "datePublished": "2026-05-27",
      "dateModified": "2026-05-27",
      "author": { "@id": "https://attrifast.com/about#person" },
      "publisher": { "@id": "https://attrifast.com/#organization" },
      "mainEntityOfPage": "https://attrifast.com/blog/your-slug"
    },
    {
      "@type": "Person",
      "@id": "https://attrifast.com/about#person",
      "name": "Vincent Ruan",
      "url": "https://attrifast.com/about",
      "image": "https://attrifast.com/authors/vincent-ruan.jpeg",
      "jobTitle": "Founder",
      "worksFor": { "@id": "https://attrifast.com/#organization" },
      "sameAs": ["https://x.com/0xVinceAI"]
    },
    {
      "@type": "Organization",
      "@id": "https://attrifast.com/#organization",
      "name": "Attrifast",
      "url": "https://attrifast.com",
      "logo": "https://attrifast.com/logo.png",
      "sameAs": [
        "https://www.linkedin.com/company/attrifast",
        "https://x.com/attrifast",
        "https://github.com/attrifast",
        "https://www.crunchbase.com/organization/attrifast"
      ]
    }
  ]
}

The @graph array is the right pattern: one script block, multiple connected nodes, every cross-reference via @id. Extraction pipelines follow @id links the way RDF resolvers do, so the Article-to-Author and Person-to-Organization edges produce a clean entity graph instead of three disconnected blobs. This is the difference between an engine knowing "this article was written by Vincent Ruan, who founded Attrifast" and an engine seeing three unrelated objects.

Validation and debugging: the two tools and the errors that break parsing

Schema that does not validate is worse than no schema, because one syntax error can make the entire @graph unparseable — a single trailing comma kills every node in the block. The validation workflow is two tools in sequence, and you run both on every template change.

Tool 1: Google Rich Results Test (search.google.com/test/rich-results [3]). This tells you whether Google can parse the markup and which rich results the page is eligible for. The catch: it only reports on types that are currently rich-result-eligible, so it will pass a valid FAQPage silently without flagging it (because FAQ rich results are deprecated for most sites). It is necessary but not sufficient.

Tool 2: Schema.org validator (validator.schema.org [12], the successor to the old Structured Data Testing Tool). This checks vocabulary correctness independent of Google's rich-result rules — wrong property names, invalid type nesting, malformed values. It catches the vocabulary errors the Rich Results Test ignores. Run this second.

The errors that actually break AI parsing, ranked by how often I see them:

ErrorWhat it looks likeEffect on AI parsingFix
Invalid JSON (trailing comma, unescaped quote)"text": "He said "hi"",Entire @graph unparseable — all nodes lostEscape inner quotes, remove trailing commas, validate JSON first
FAQ name mismatch with visible H3JSON says "How to X?", page H3 says "X guide"Google flags inconsistent; pair untrustworthyMatch Question.name to H3 verbatim
Missing required fieldOffer with no priceType fails validation, droppedAdd required fields per schema.org spec
Wrong @type nestingReview directly under ArticleVocabulary error, node ignoredNest per schema.org type hierarchy
Duplicate @id across blocksTwo nodes share #organizationAmbiguous entity resolutionOne @id per entity, referenced not duplicated
Fabricated AggregateRatingRatings with no on-page reviewsGoogle manual-action risk [13]Only ship if reviews are real and visible
Date not ISO 8601"datePublished": "May 27, 2026"Parsed as string, freshness signal lostUse "2026-05-27"
noindex on a schema'd pageValid schema on a noindex pagePage excluded from index entirelyRemove noindex from canonical pages

The debugging sequence when a page is not getting cited despite shipping schema:

  1. Confirm the page is server-rendered. View source (not the rendered DOM) and search for your H1 text. If the body is empty because content hydrates client-side, the schema is on a page the crawler sees as blank. This is the most common silent failure on Next.js and similar SPA-style sites — covered in the AI search ranking factors piece.
  2. Confirm the JSON-LD is in the served HTML. Same view-source check — the application/ld+json block must be in the raw HTML, not injected by client JS.
  3. Run both validators. Rich Results Test for Google parseability, Schema.org validator for vocabulary.
  4. Check the crawler can reach it. Confirm robots.txt does not block GPTBot, PerplexityBot, or ClaudeBot, and there is no noindex. For the full crawler-access picture, see llms.txt vs robots.txt.
  5. Wait for recrawl. Schema added today is not seen instantly. Perplexity recrawls fastest (7-14 days), ChatGPT search slower (14-30 days), AIO follows the Google index.

The honest caveats: necessary, not sufficient

This is the section the GEO-vendor blogs skip, and it is why I am writing it. Schema markup is real and worth shipping, but the honest framing is "necessary-not-sufficient scaffolding," and three specific caveats keep operators from over-investing.

Caveat 1: schema is correlation, not proven causation. The 3.4-vs-1.1 density finding [6][9] is a strong correlation. The one controlled study — the Princeton GEO paper [1] — tested textual interventions (statistics, quotations, inline citations) and measured citation lifts up to 40% on Perplexity and BingChat. It did not test schema as a variable. So the rigorous causal evidence in this space sits on the text side, not the schema side. Pages with four schema types are also pages built by teams that write well; the schema may be a marker of quality rather than the cause of citation. I ship schema anyway, because the marginal cost is near zero and the correlation is consistent, but I do not pretend the causal case is settled.

Caveat 2: the documented/inferred split is load-bearing. The only documented schema effect is Google reading JSON-LD for rich results and Knowledge Graph entity resolution [4]. That ChatGPT, Perplexity, and Claude weight schema for citation is inferred from third-party studies — no vendor publishes it. Anyone who tells you "ChatGPT's algorithm weights FAQPage at X%" is stating speculation as fact. Optimize for the documented Google effect first; treat the chat-engine citation lift as a likely bonus that comes free once the markup ships.

Caveat 3: the FAQ/HowTo deprecation is a SERP change, not a parsing change. Worth repeating because it is so widely misread: Google's 2023 reduction of FAQ and HowTo rich results [5] removed visible SERP snippets for most sites. It did not stop Google or any AI engine from parsing the markup. If you removed your FAQPage schema because "Google deprecated it," you removed one of the highest-leverage AI-citation signals over a SERP-cosmetic change. Put it back.

Caveat 4: schema cannot rescue thin content. A perfect schema bundle on the 47th generic "what is GEO" explainer will not get cited over the canonical sources. Schema amplifies content; it does not substitute for it. If your page is not the best answer to the query, structured data does not make it the best answer — it just makes a mediocre answer easier to parse. The order of operations is: write the canonical answer, then make it parseable with schema. Reversed, you are polishing a page nobody should cite.

How schema connects to revenue (the Attrifast wedge)

Here is the chain, and here is exactly where it breaks. Schema earns citations. Citations earn clicks. Clicks may earn signups. Signups may earn revenue. Every link in that chain except the last is plausibly observable — you can watch citation rate move (manually, weekly), you can watch referral traffic land. But the last link, "which schema'd pages actually drove paid customers," is invisible in GA4 by default, and that is the gap that makes most GEO programs run blind.

The mechanism of the gap: GA4 buckets nearly 100% of AI-engine referrals as (direct) / (none). ChatGPT, Perplexity, and Claude frequently strip the Referer header on outbound clicks, and GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, or claude.ai. So when someone reads your FAQPage-schema'd post via a Perplexity citation, clicks through, and pays $29/mo two weeks later, GA4 files that revenue under "Direct" alongside email clicks and typed URLs. You shipped the schema, the schema earned the citation, the citation earned the customer — and your analytics cannot tell you any of it happened.

This is the Attrifast wedge, and I want to be precise about what we do and do not do. Attrifast does not do GEO. We do not generate your schema, write your Organization block, or audit your sameAs array — this article is the free version of that work, and you can ship all of it yourself in an afternoon. What Attrifast does is the boring measurement layer underneath: a 4kb cookieless script on your domain detects AI-engine referrers (chatgpt, perplexity, claude, gemini) using first-party signals, and when Stripe fires checkout.session.completed, the webhook joins the stored source to the payment server-side. You see perplexity in the channel column next to a real dollar figure, not (direct).

The reason this matters specifically for schema work: schema decisions are investments, and right now most operators make them on faith. With revenue attribution in place, you can answer the question that actually matters — not "did my FAQPage block earn a citation" (presence) but "did the pages I schema'd drive more revenue than the pages I did not" (return). For the full architecture, Attrifast's revenue attribution feature page covers the cookieless detection and the Stripe join, and the ChatGPT traffic tracking guide covers the AI-referrer detection rules specifically. The honest summary: ship the schema yourself, for free, from this article — then measure whether it paid, because schema earns citations and only attribution tells you which citations earned money.

What we shipped on attrifast.com (and what the numbers say)

Putting the schema playbook on our own site over the last 90 days, with the honest results:

  • Site-level Organization + Person graph. One Organization block with five sameAs surfaces, one Person block for me with worksFor cross-linked. Validated clean in both tools.
  • Article + FAQPage on all 32 posts. Each FAQPage carries 4-8 items matching visible H3s verbatim. We fixed three FAQ-mismatch warnings during the rollout.
  • HowTo on the ~8 procedural posts only. Removed from informational posts after measuring no lift.
  • SoftwareApplication + Offer on pricing and product pages only. Real $29/mo price, no AggregateRating until we had genuine reviews to mark.
  • Server-side AI-referrer detection. Our script tags chatgpt, perplexity, claude, and gemini explicitly, joined to Stripe at payment.

The honest results, per internal logs: across the fixed 25-query panel, Perplexity citations moved most and fastest after the schema bundle shipped, ChatGPT moved moderately, Claude barely moved, and AIO lagged (it wants rank first). AI-referred sessions grew from negligible to a measurable single-digit percent of total traffic. I will not publish absolute revenue numbers — n is too small to be useful for a bootstrapped SaaS, and one viral mention skews the chart. What I will defend to another founder: the Organization/sameAs work and the FAQPage work were the two interventions where I felt the difference within two weeks, and the schema-density correlation held on my own property exactly as the Ahrefs and Semrush studies predicted. What I will not defend in front of a paid research audience: that the schema alone caused it, because the Direct Answer blocks shipped in the same windows and I cannot cleanly separate the two.

The acknowledged failure: I shipped Product schema on feature pages, BreadcrumbList on a flat blog, and oversized 28-item FAQ blocks — all three measured zero lift or net-negative, and all three came out. The rule that survived every experiment: a JSON-LD block with no corresponding visible on-page element is signal pollution. Ship schema that mirrors what is actually on the page, validate it in both tools, and then measure whether it paid.

Limitations

  • This article does not cover Bing Chat / Copilot schema behavior in detail. Bing's index is closer to traditional search, and its schema handling tracks classic Bing structured-data rules more than the chat-engine pattern.
  • It does not cover Speakable or voice-assistant schema. Speakable is limited to news publishers in a small Google Assistant subset; voice-mode ChatGPT does not parse it. I tested it, measured nothing, removed it.
  • The chat-engine citation effects are inferred, not vendor-confirmed. Every "inferred" cell in the matrix carries the caveat that the underlying weighting could differ from what the observational data implies.
  • The schema-density study [6][9] is observational. The 3.4-vs-1.1 finding is a correlation across SEO-vendor studies whose incentives bias toward "schema matters." I have weighted it accordingly and cited the controlled Princeton paper [1] as the stronger-but-narrower source.
  • Sample sizes on attrifast.com are small. A single bootstrapped SaaS with 32 posts and a 25-query panel produces directional signal, not statistical significance.
  • Schema vocabulary evolves. schema.org [11] ships new versions periodically, and Google's rich-result eligibility changes (as the 2023 FAQ/HowTo deprecation showed). Re-validate templates when either changes.

FAQ

Which schema types do AI search engines actually read in 2026?

All four major engines parse standard schema.org JSON-LD, but they weight types differently. Organization (with a populated sameAs array), Article, and FAQPage carry the most observable citation weight across ChatGPT, Perplexity, and Gemini. HowTo helps on procedural queries. Product, Offer, and AggregateRating matter on commercial pages. BreadcrumbList and Review are mostly decorative for citation. The honest split: Google documents that it reads all of these; that ChatGPT, Perplexity, and Claude weight them for citation is inferred from third-party studies, not vendor-confirmed.

Does FAQPage schema still work even though Google deprecated the rich result?

Yes, and the deprecation is widely misread. Google's August 2023 change reduced FAQ rich results in the visible SERP to authoritative government and health sites — it did not stop Google from parsing FAQPage markup, and it has zero effect on whether ChatGPT or Perplexity extract your question-answer pairs. The rich-result snippet and the structured-data parse are two different things. FAQPage remains one of the highest-leverage schema types for AI citation precisely because it ships pre-extracted Q-A pairs that match how people phrase queries to chat assistants.

Is JSON-LD better than Microdata or RDFa for AI engines?

Yes. JSON-LD is the format Google explicitly recommends, and it is the cleanest to parse because the structured data lives in one self-contained script block rather than being interleaved with visible HTML. Every documented AI crawler (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended) reads JSON-LD. Microdata and RDFa still validate, but they are harder to extract reliably and offer no upside. If you are starting fresh in 2026, ship JSON-LD and ignore the other two syntaxes.

How much does schema actually move AI citations versus content quality?

Schema is necessary, not sufficient. In Ahrefs and Semrush GEO research through 2025-2026, AI-cited pages carried roughly 3.4 distinct schema types on average versus about 1.1 on uncited pages — a real correlation. But the Princeton GEO paper (Aggarwal et al, 2024) found the largest controlled citation lifts came from textual interventions (statistics, quotations, inline citations), not from schema, which it did not test as a variable. The honest read: schema is the parseable scaffolding that makes good content extractable. Ship it on quality content; do not expect it to rescue thin content.

Do I need Product and Offer schema if I'm a SaaS, not e-commerce?

For your pricing and product pages, yes — but use SoftwareApplication or Product with an Offer node, not a bare Product. AI engines pull price, billing cadence, and availability directly from Offer when a user asks 'how much does X cost.' For a $29/mo tool like Attrifast, a clean Offer block with price, priceCurrency, and a billingDuration is the difference between an engine quoting your real price and hallucinating one. On blog posts, skip Product entirely — it is noise with no on-page analog.

What's the most common schema mistake that breaks AI parsing?

Mismatch between the JSON-LD and the visible page. The single most common failure is a FAQPage Question.name that does not match the on-page H3 verbatim, which Google flags as inconsistent and which makes the Q-A pair untrustworthy to extraction pipelines. The second is invalid JSON (a trailing comma, an unescaped quote) that makes the entire block unparseable — one syntax error kills every node in the graph. Validate every block in Google's Rich Results Test and the Schema.org validator before shipping.

Does schema markup help with ChatGPT specifically, or just Google?

Google documents that it reads schema for rich results and Knowledge Graph entity resolution — that is the only vendor-confirmed effect. For ChatGPT, Perplexity, and Claude, schema's citation effect is inferred from observational studies, not documented by the vendors. The mechanism is plausible: their crawlers parse JSON-LD, and pre-structured Q-A pairs and entity links are cheaper to extract than prose. But anyone claiming 'ChatGPT's algorithm weights FAQPage at X%' is speculating. Optimize for the documented Google effect first; treat the chat-engine lift as a likely bonus.

How do I validate that my schema is correct?

Use two tools in sequence. First, Google's Rich Results Test (search.google.com/test/rich-results) tells you whether Google can parse the markup and which rich results it is eligible for. Second, the Schema.org validator (validator.schema.org, formerly the Structured Data Testing Tool) checks vocabulary correctness independent of Google's rich-result rules. The Rich Results Test will pass valid-but-rich-result-ineligible types like FAQPage silently, so the Schema.org validator catches vocabulary errors the Google tool ignores. Run both on every template change.

Will adding schema to old pages re-trigger AI citations?

On RAG-based engines, yes, after a recrawl. Perplexity tends to pick up newly added schema within 7-14 days of recrawling the page; ChatGPT search runs slower, typically 14-30 days; Google AI Overviews follows the underlying Google index. Schema added today does not retroactively enter a foundation model's training corpus — that only happens on the next training pass, which runs on a 6-12 month cycle. So retrofitting schema helps live-retrieval citations relatively fast and training-corpus knowledge slowly.

How does schema connect to revenue, not just citations?

It doesn't, directly — and that is the gap. Schema earns citations, citations earn clicks, and clicks may earn signups, but only revenue attribution tells you which schema'd pages actually drove paid customers. GA4 buckets nearly all AI-engine referrals as Direct/(none), so the chain from 'this FAQPage block got cited by Perplexity' to 'this customer paid $29/mo' is invisible by default. Server-side first-party attribution that detects AI referrers and joins them to Stripe payments is what closes that loop. That is the layer Attrifast was built to provide.

Should I ship every schema type to maximize citations?

No. Ship the types that mirror visible on-page content and skip the rest. A JSON-LD block with no corresponding visible element (Product schema on a feature page, BreadcrumbList on a flat blog, a 28-item FAQ block) is signal pollution that adds parse cost and risks validation warnings for zero lift. The four-type starter bundle — Organization, Article, FAQPage, Person — plus HowTo on procedural pages and Product/Offer on commercial pages, is the whole job for most sites. More types is not more citations.

Is fake AggregateRating worth the risk for AI visibility?

No. Fabricating an AggregateRating (ratings with no real, visible reviews) is a Google manual-action risk per Google's review snippet policy, and there is no credible evidence the chat engines reward it for citation — it grades 'speculative' on every chat-engine column. The downside (a manual penalty that can drop your Google rankings) far exceeds any plausible upside. Only ship AggregateRating that reflects genuine reviews present on the page. If you have zero reviews, omit the block.

Does schema replace the need for llms.txt and good content?

No — they are complementary and operate at different layers. Schema labels the entities on a page so crawlers can extract them cleanly; llms.txt is a curated index telling crawlers which pages to prioritize; good content is the substance the schema labels. You need all three. Schema with thin content gets parsed but not cited; great content with no schema gets cited less efficiently. For the llms.txt layer specifically, see the llms.txt vs robots.txt breakdown; for the broad citation playbook, see how to get cited by AI engines.

How long does schema take to ship across a whole site?

Roughly 1-2 hours per template (Organization and Person blocks site-wide, an Article + FAQPage template for posts, a Product + Offer template for commercial pages), then about 5 minutes per page to fill in the per-page values and validate. For a 30-post site, budget a focused day for the templates and the rollout, plus the cannibalization and view-source checks. The ongoing cost is near zero — re-validate when schema.org versions change or when Google updates rich-result eligibility.

For the broad citation playbook that schema slots into — Direct Answer blocks, llms.txt, entity work, and the seven content patterns LLMs preferentially cite — how to get cited by AI engines is the companion how-to. For the retrieval mechanics that decide where schema fires in the pipeline, the AI search ranking factors breakdown is the mechanics layer, how AI engines choose sources covers the pretraining-vs-RAG split, and get cited by Google AI Overviews covers the highest-volume Google surface. For the crawler-access layer, llms.txt vs robots.txt walks every user-agent. And for the measurement layer that turns shipped schema into a revenue number, Attrifast's revenue attribution joins AI-engine sessions to Stripe server-side, with the surface-specific guide for tracking ChatGPT traffic.

Related reading

Competitive Analysis29 min
How to Analyze Your Competitors' AI Visibility (and Beat Them in 2026)
A step-by-step method to analyze why ChatGPT, Perplexity, Claude and Gemini recommend your competitors over you — build a buying-query prompt set, tally per-competitor share of voice, teardown their citation sources, then close the gaps that actually drive your revenue.
GEO Strategy27 min
ChatGPT Cited My Competitor, Not Me: An Honest Diagnosis
A SaaS founder DMs you a screenshot of ChatGPT recommending a competitor for the exact query you used to own on Google. Why it happens, what to do, and how to prove the fix actually moved revenue, not vibes.
Content Strategy26 min
Content Refresh for AI Citations: How Freshness Wins You GEO Visibility in 2026
The tactical content-refresh playbook for AI citations: why freshness is a retrieval-pathway signal, what to actually change in a refresh, the fake-freshness penalty, and a worked 12-post batch with per-engine results.
Strategy32 min
Is AEO Replacing SEO? The Honest 2026 Answer From Someone Running Both
AEO is not replacing SEO, but the people saying 'SEO is fine' are also wrong. The third option nobody is selling, with operator data from a year of running both stacks side by side.
AI Search27 min
How to Report AI Search ROI to Leadership: The 2026 GEO Reporting Playbook
A founder's playbook for reporting AI search and GEO ROI to a CEO, CFO, and board — translating leading indicators into revenue, building the one-slide update, and presenting attribution with honest error bars that survive a skeptical CFO.

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

5-day free trial · $29/mo · cancel anytime