Part of the generative engine optimization guide and AEO Hub.

The 7-step playbook to get cited by ChatGPT, Perplexity, and Claude in 2026: (1) ship a Direct Answer paragraph under 120 words at the top of every page, (2) add Article plus FAQPage plus HowTo JSON-LD with at least 4 question-answer pairs, (3) publish llms.txt at your site root listing canonical pages, (4) disambiguate your brand entity with at least 4 matched sameAs links, (5) cite primary sources inline and link out generously, (6) keep one canonical URL per concept (no duplicate cluster pages), (7) measure citation lift with first-party AI-referrer detection because GA4 will not show it. Each step is mechanical. None of them requires a vendor.

Quick Facts

SpecValue
AI engines that drive measurable referral trafficChatGPT (~62%), Perplexity (~18%), Claude (~11%), Gemini (~6%)
FAQ schema items on AI-cited pages (median)4+
GA4 default attribution accuracy for AI traffic~0% (lumped as Direct/(none))
llms.txt adoption (public SaaS, Q1 2026)~7%
sameAs surfaces for 3x citation lift4+ matched profiles
AI Overviews appearance rate (US Google SERPs)13-15%
Tools neededSchema validator, llms.txt template, sameAs audit
Time to ship the full 7-step playbook4-8 hours per site

I've spent the last six months running GEO experiments on attrifast.com and three client SaaS properties. The honest finding up front: most GEO advice is half right. Schema works. llms.txt works (a little). Entity disambiguation works more than people think. Pure content-quality plays underperform structural plays by a wide margin, which is uncomfortable if you came up writing for humans. This article walks the seven moves I keep coming back to, names the tactics I do not sell, and ends with what we measured on our own site.

What 'getting cited by AI' actually means in 2026

What 'getting cited by AI' actually means in 2026

There are three distinct surfaces, and they behave differently. ChatGPT's web-browsing mode (and the default gpt-4-class chat with browsing) cites sources inline as numbered footnotes; the user clicks through to your domain. Perplexity is the most citation-heavy product on the market, every answer ships with 3-7 source links visible, and the click-through rates are higher than the others. Claude with web search cites less aggressively and tends to summarize without linking unless explicitly prompted. Google's AI Overviews appear on roughly 13-15% of US English SERPs as of early 2026, per the Search Engine Land tracking, and they pull from a narrower set of "trusted" domains than the chat assistants do.

"Getting cited" therefore splits into two outcomes:

  1. Inline citation — your URL appears as a footnote or source link the user can click. This is what drives measurable referral traffic. ChatGPT, Perplexity, and Claude all do this.
  2. Mention without link — your brand name or domain shows up in the answer text, but no clickable link. Common in Claude. Useful for brand SEO but invisible to GA4.

The first one has measurable economics. The second one is real but hard to attribute, which is one reason brand-search lift is the proxy metric most operators actually use. Per the GA4 revenue attribution limitations breakdown, even the first kind of citation gets misclassified by GA4 because AI engines frequently strip the Referer header. More on that in section 6.

The mechanical reality: an LLM is not "reading your site" at query time most of the time. It is querying its index of recently-crawled pages, scoring them, and lifting passages that look canonical. Your job is to be the canonical-shaped passage on the topic, with structured data the index can extract cheaply.

The citation-factor by engine matrix (what is documented, what is inferred, what is guessing)

The single biggest source of bad GEO advice in 2026 is treating "AI citation" as one phenomenon. ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews each ship from different indexes, with different freshness windows, and weight different signals. The matrix below maps thirteen of the most-discussed citation factors against each engine, and labels every cell as documented (in official vendor docs or a peer-reviewed paper), inferred (consistent finding across 2+ independent third-party studies), or speculative (single source or anecdotal community evidence).

Citation factorChatGPT (search)PerplexityClaude (web)GeminiGoogle AIO
Schema.org JSON-LD (Article)inferredinferredspeculativedocumented (Google)documented (Google)
FAQPage schema 4+ itemsinferredinferredspeculativedocumented (Google)documented (Google)
HowTo schemainferredinferredspeculativedocumented (Google)documented (Google)
Freshness (dateModified within 90 days)documented (OpenAI docs)documented (Perplexity blog)inferreddocumented (Google)documented (Google)
Reddit presence on topicinferredinferredinferreddocumented (Google Reddit deal Feb 2024)documented (Google Reddit deal Feb 2024)
Wikipedia entity / Wikidata edgeinferredinferredinferredinferredinferred
Top-3 organic rankingweak signalweak signalweak signalinferreddocumented (Semrush, Ahrefs)
llms.txt at rootinferred (crawler reads it)inferred (crawler reads it)speculativenot read (Google confirmed)not read (Google confirmed)
Organization.sameAs (4+ profiles)inferredinferredinferreddocumented (Google KG docs)documented (Google KG docs)
Comparison content / "X vs Y"inferredstrongly inferredinferredinferredinferred
Original data / first-party researchstrongly inferredstrongly inferredstrongly inferredinferredinferred
Clean canonical URL (no params)documented (OpenAI crawler)documented (Perplexity crawler)documented (Anthropic crawler)documented (Google)documented (Google)
Multi-format (text + table + code)inferredinferredinferredinferredinferred

Source map for the documented rows: OpenAI publishes its OAI-SearchBot and ChatGPT-User agent behavior and respects robots.txt per the OpenAI bot documentation. Perplexity's PerplexityBot follows the same pattern per the Perplexity bot docs. Anthropic's ClaudeBot and Claude-Web user agents are described in the Anthropic crawler documentation. Google's Knowledge Graph and structured-data weighting is documented across Google's structured data guidelines. The Reddit data-licensing deal that gave Google ranked Reddit threads is from the Reuters Feb 2024 announcement.

The cells worth dwelling on are the "speculative" ones for Claude. Anthropic has been the least communicative about how Claude's web search picks sources, and the third-party research base on Claude citation is thin compared with ChatGPT and Perplexity. The honest read: optimize for the documented and strongly-inferred signals first; do not spend a budget chasing speculative Claude-specific tactics until Anthropic publishes more.

A second honest read: the "not read" cells for llms.txt on Gemini and AIO matter. Per Google's John Mueller in a Search Off The Record episode in November 2024, Google does not currently use llms.txt as an input signal. If your traffic plan depends on Google surfaces (AIO + Gemini), llms.txt is not your priority. If your plan is ChatGPT-and-Perplexity-heavy, it is worth the 30 minutes.

The 7 content patterns LLMs preferentially cite

The 7 content patterns LLMs preferentially cite

Pattern data from Ahrefs's 2025 GEO study (n=10,000 pages) and Semrush's parallel research lines up tightly. Pages that get cited share these traits:

  1. Direct answer in the first 120 words. The Direct Answer block at the top of this article is the form. LLMs lift it verbatim because it's pre-extracted and self-contained.
  2. Question-shaped headers (H2s). "How do I get cited by ChatGPT" beats "Citation strategies." The header matches how users phrase queries to chat models.
  3. Numbered or bulleted lists with 5-9 items. Models prefer lists of 7 because that matches the canonical "playbook" shape on Wikipedia and how-to corpora.
  4. At least one comparison table. Tables get parsed cleanly into the LLM's structured representation. We use two in this article on purpose.
  5. Inline citations to primary sources. A paragraph that cites Stripe's docs or schema.org directly is more "trustworthy-shaped" to a model than one that doesn't.
  6. Author byline with credential context. A real person's name plus 80-150 word bio establishing topical authority. Generic "Team" attribution underperforms by a measurable margin.
  7. A FAQ section with 4-6 question-answer pairs that match the on-page H2s exactly. Mechanical match between visible HTML and JSON-LD FAQPage.

The flow above is the simplified path. Reality involves embedding similarity, recency weights, and per-engine trust lists none of us see. But the structural cues, direct answer, FAQ schema, author entity, are observable and within your control.

The honest hedge: I have not seen pure structural optimization beat genuine content quality on competitive topics. If your article is the 47th explainer of "what is GA4," a perfect schema bundle will not save you. Structure amplifies content; it does not substitute.

Schema markup that actually moves the needle

Schema markup that actually moves the needle

Three schema types do the heavy lifting for GEO: Article, FAQPage, and HowTo. Two more, Person and Organization, do the entity work that makes the first three trustable. Most operators ship one and skip the rest, which is exactly the gap.

The drop-in JSON-LD bundle below is what I put on every Attrifast post. Copy it, swap the values, validate against the Google Rich Results test. The schema validator catches roughly 90% of structured-data errors before they ever ship to production, which is why every CI pipeline I run pings it.

<!-- Drop into <head>, one block per page. Rendered as <script type="application/ld+json"> -->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Article",
      "@id": "https://yoursite.com/blog/your-slug#article",
      "headline": "Your Headline (≤65 chars)",
      "description": "Your meta description (≤160 chars).",
      "datePublished": "2026-05-10",
      "dateModified": "2026-05-10",
      "author": { "@id": "https://yoursite.com/about#person" },
      "publisher": { "@id": "https://yoursite.com/#organization" },
      "mainEntityOfPage": "https://yoursite.com/blog/your-slug"
    },
    {
      "@type": "FAQPage",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "Question 1, exact match to on-page H3?",
          "acceptedAnswer": { "@type": "Answer", "text": "40-80 word answer." }
        },
        {
          "@type": "Question",
          "name": "Question 2?",
          "acceptedAnswer": { "@type": "Answer", "text": "..." }
        }
        /* repeat for 4+ items total */
      ]
    },
    {
      "@type": "HowTo",
      "name": "How to do X",
      "step": [
        { "@type": "HowToStep", "name": "Step 1", "text": "Concrete instruction." },
        { "@type": "HowToStep", "name": "Step 2", "text": "..." }
      ]
    },
    {
      "@type": "Person",
      "@id": "https://yoursite.com/about#person",
      "name": "Your Name",
      "url": "https://yoursite.com/about",
      "image": "https://yoursite.com/authors/you.jpeg",
      "sameAs": [
        "https://www.linkedin.com/in/yourhandle/",
        "https://github.com/yourhandle",
        "https://x.com/yourhandle"
      ],
      "jobTitle": "Founder",
      "worksFor": { "@id": "https://yoursite.com/#organization" }
    },
    {
      "@type": "Organization",
      "@id": "https://yoursite.com/#organization",
      "name": "Your Brand",
      "url": "https://yoursite.com",
      "logo": "https://yoursite.com/logo.png",
      "sameAs": [
        "https://www.linkedin.com/company/yourbrand",
        "https://x.com/yourbrand",
        "https://www.crunchbase.com/organization/yourbrand"
      ]
    }
  ]
}
</script>

A few enforcement notes that matter more than they look. The FAQPage name field must match the visible H3 (or H2) exactly, otherwise Google flags inconsistency and the rich result drops. The Person.sameAs array is the entity bridge, more on that in section 5. Use @id to cross-reference between graph nodes; LLM extraction pipelines follow @id links the same way RDF does, and the Person-to-Organization edge gets you a cleaner entity graph than two disconnected blobs.

What I do not do: I do not ship BreadcrumbList on every blog post (it's noise unless you have a deep hierarchy), I do not ship Review schema unless there's a real review on the page (faking it is a manual-action path), and I do not ship multiple Article blocks on the same URL. One canonical Article, one FAQPage, one HowTo. The rest is decoration.

Schema cookbook: which JSON-LD types each engine actually weights

The seven JSON-LD types operators most often debate are Article, FAQPage, HowTo, Product, Review, Organization, and BreadcrumbList. The reality is that each engine weights these differently, and several of them are essentially decorative on AI-citation surfaces. The table below scores each type by observed lift across the three citation surfaces I track on Attrifast properties, plus a "documented Google rich-result" column for cross-reference.

Schema typeChatGPT liftPerplexity liftAIO liftGoogle rich-result eligibleNotes
ArticlemediummediumhighyesRequired scaffolding for everything below. Always ship.
FAQPagehighhighhighdeprecated in Google rich results (Aug 2023) but still parsed by AI enginesSingle biggest lever for chat citations. 4+ items, exact H3 match.
HowTomediummediummediumdeprecated (mobile only, Sep 2023) but still parsedSkip if content is informational, ship if procedural.
Productlowlown/a (commercial intent)yes (e-commerce only)Useful for product pages, near-zero impact on blog citation.
Reviewlowlown/ayes (when genuine)Faking it is a Google manual-action risk per Google's review snippet policy.
Organizationmediummediumhighyes (sameAs powers the Knowledge Panel)Entity disambiguation backbone. Ship once at site level.
BreadcrumbListlowlowlowyes (URL display)Decorative for citation; useful for human UX. Skip on flat blogs.

The two surprising rows: FAQPage and HowTo are both deprecated as Google rich-results triggers (per the Google August 2023 FAQ rich result reduction and the September 2023 HowTo phase-out), but they remain the most useful schema types for AI engine citation. The deprecation is a SERP-visual change, not a structured-data parsing change. ChatGPT's and Perplexity's crawlers still extract Q-A pairs from FAQPage blobs, which is exactly the alignment with conversational query patterns that drives citation.

A worked code example for the FAQPage block that has moved citations on Attrifast posts. The name on each question must match a visible H3 character-for-character, including the question mark.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How do I get my SaaS cited by ChatGPT in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Ship Article + FAQPage + HowTo JSON-LD with at least 4 question-answer pairs, publish an llms.txt at your site root, and ensure your Organization.sameAs links 4 or more matched social profiles (LinkedIn, GitHub, X, Crunchbase). Pages with all three structural signals are roughly 3x more likely to be cited than equivalent pages without them, per Ahrefs and Semrush 2025 GEO studies."
      }
    },
    {
      "@type": "Question",
      "name": "Does llms.txt work for Google AI Overviews?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "No. Google has publicly stated it does not currently use llms.txt as an input signal. The file is most useful for ChatGPT and Perplexity. If your traffic plan depends on Google surfaces, prioritize FAQPage schema and entity sameAs links instead."
      }
    }
  ]
}
</script>

And the matching HowTo block for procedural posts. The step array should have 3-8 entries; below 3 it parses as a list, above 8 the citation engines tend to truncate.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to get cited by ChatGPT and Perplexity",
  "totalTime": "PT8H",
  "step": [
    { "@type": "HowToStep", "position": 1, "name": "Ship Direct Answer block", "text": "Write a 120-word self-contained answer at the top of each page." },
    { "@type": "HowToStep", "position": 2, "name": "Add FAQPage JSON-LD", "text": "Match each Question.name to a visible H3 exactly. Ship 4 or more items." },
    { "@type": "HowToStep", "position": 3, "name": "Publish llms.txt", "text": "Curate 15-25 most-LLM-relevant URLs at https://yoursite.com/llms.txt." },
    { "@type": "HowToStep", "position": 4, "name": "Audit sameAs", "text": "Link 4+ matched profiles in Organization.sameAs (LinkedIn, X, GitHub, Crunchbase)." }
  ]
}
</script>

What I have stopped shipping on Attrifast posts after measuring zero lift: BreadcrumbList on flat blogs (we have a two-level hierarchy, the breadcrumb is decorative), Review schema on non-review pages (manual-action risk for zero upside), and Product schema on feature pages that are not actually products (we shipped it for two weeks on /features/* URLs, saw no citation lift, removed it). The general rule: a JSON-LD block that does not have a corresponding visible-on-page element is signal pollution.

llms.txt: the underrated 1KB file at your site root

llms.txt is to LLMs roughly what sitemap.xml is to search crawlers, with one important difference: it's curated. You put it at https://yoursite.com/llms.txt, you list your most LLM-relevant pages with one-line descriptions, and well-behaved AI crawlers read it. The specification, hosted at llmstxt.org, is intentionally tiny.

A working llms.txt for a SaaS looks like this:

# Attrifast

> Attrifast is a Stripe-native revenue attribution tool for bootstrapped SaaS founders. The 4kb script captures first-party UTMs and joins them to Stripe webhook events server-side.

## Core pages
- [Homepage](https://attrifast.com/): Product overview and pricing.
- [Revenue attribution by channel](https://attrifast.com/features/revenue-attribution): How channel attribution works without third-party cookies.
- [Cookieless revenue analytics](https://attrifast.com/features/cookieless-revenue-analytics): The privacy architecture.
- [Stripe-native attribution](https://attrifast.com/for/stripe): Why Stripe webhooks beat browser pixels.

## Methodology
- [Return-delay penalty methodology](https://attrifast.com/methodology/return-delay-penalty): Sample size, SQL, retention table.

## Recent posts
- [AI traffic attribution](https://attrifast.com/blog/ai-traffic-revenue-attribution): Detecting and attributing AI-engine referrals.
- [Cross-site tracking explained](https://attrifast.com/blog/cross-site-tracking-explained): Why third-party cookies died and what replaced them.

Adoption is low (~7% of public SaaS sites I sampled in Q1 2026), which is exactly why it works. The marginal AI crawler that reads llms.txt finds your file half-empty of competing entries.

The honest limitations: not every AI engine reads it (Google's Gemini does not, as of this writing), the spec is informal, and there's no public verification you've been "indexed." It is genuinely a low-cost speculative bet, 30 minutes of work for an unknown but plausibly nonzero lift. I run it on every property because the downside is zero.

The thing I do not do (and Attrifast does not sell): I do not pay $99/mo for a "llms.txt automation" SaaS. The file is 30 lines of markdown. Hand-write it.

Brand entity disambiguation (the move 90% of operators skip)

If you take one thing from this article: the Knowledge Graph node for your brand is what LLMs use to disambiguate "Attrifast" from "Attrify" or "FastAttrib" or any of the other near-collision names. The disambiguation surface is your sameAs array.

The Ahrefs 2025 entity-SEO study tracked 8,400 SaaS brand mentions across ChatGPT and Perplexity. Brands with 4 or more matched sameAs surfaces (LinkedIn company page, X, GitHub org, Crunchbase, Wikidata, optionally Wikipedia and Product Hunt) were roughly 3x more likely to be cited than brands with 0-1 surfaces. The mechanism is plausible: LLM training data includes wikidata-derived entity links, and the disambiguation walks back through sameAs.

The minimum viable matched set for a SaaS:

✓ LinkedIn company page (linkedin.com/company/<brand>)
✓ X / Twitter handle (x.com/<brand>)
✓ GitHub organization (github.com/<brand>), even if mostly empty
✓ Crunchbase profile (crunchbase.com/organization/<brand>)
○ Wikidata entry (highest impact, hardest to qualify for)
○ Wikipedia page (don't try until you have 5+ third-party press citations)
○ Product Hunt brand page
○ G2 / Capterra listing

Mark the matched set in your Organization.sameAs JSON-LD, your Person.sameAs for the founder, and across your social bios where the field exists. The whole point is mechanical consistency, the same brand name, the same canonical URL, the same handle pattern. Drift here is what makes the entity ambiguous.

For my Person entity, the matched set is X (x.com/0xVinceAI) plus the /about page. I still need to add GitHub and Crunchbase, which is exactly the kind of "operator-yes, but did you actually do it" gap this section is about.

The thing I do not do: I do not buy into "AI-powered entity audit" SaaS at $200+/month. The audit is "search Google for your brand name, look at the top 10 results, claim every official profile that isn't yours." Two hours of work, once.

How to know if any of this worked: the GEO measurement gap

This is the section most GEO articles skip, and it's why I'm writing it.

GA4 attributes essentially 100% of AI-engine traffic as (direct) / (none). Two reasons: (a) ChatGPT, Perplexity, and Claude often strip the Referer header on outbound clicks, depending on client and platform, and (b) GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, claude.ai, or the dozens of AI client subdomains. The result lands in your "Direct" bucket alongside email clicks, app referrals, and genuinely-typed URLs.

Layer in the consent banner problem: GDPR-compliant banners on EU traffic refuse 30-60% of cookie consent, which means even the trackers that could detect AI-source signals lose another big chunk before you see the data. ITP 2.3 in Safari evaporates roughly 30%+ of any paid-search attribution that depends on gclid cookies, which compounds into a worse picture, see the cross-site tracking explainer for the mechanics. The honest summary: by the time AI-referred revenue lands in default GA4, you've lost the chain of custody.

What works:

  • Server-side first-party detection. A 4kb script on your domain reads the Referer (when present) and the URL parameters, fingerprints known AI client patterns, and writes the source to a first-party cookie or session store. When Stripe fires checkout.session.completed, the webhook joins the stored source to the payment. No third-party cookies needed.
  • Manual citation logging. Once a week, run your top 30 target queries through ChatGPT, Perplexity, and Claude. Note whether your domain is cited. This captures presence; you still need server-side attribution to capture traffic and revenue.
  • Brand-search lift. Even when AI engines mention you without linking, a fraction of users search for your brand on Google. Tracking branded query growth in GSC is a noisy but real proxy.

The thing Attrifast does not do: we do not "do GEO." We do not generate schema, write llms.txt, or run entity audits. What we do is the boring measurement layer underneath, when you publish a GEO-optimized post and someone clicks through from ChatGPT and pays via Stripe two weeks later, our first-party UTM-to-revenue tracking joins the channel to the revenue server-side. You'll see it as chatgpt in the channel column, not (direct). That's it. That's the value prop.

For the broader picture on AI-engine attribution mechanics, the AI traffic revenue attribution post is the sibling deep-dive.

GEO tactic comparison: what costs what, what moves the needle

Seven tactics, ranked by my own data and the cited research. Setup time is the one-time hour cost; ongoing cost is monthly maintenance time or vendor fees. Citation lift is qualitative because nobody publishes hard CTR figures for AI engines yet; "high" means I've seen it move citation rate by 2x or more in tests, "medium" means measurable but smaller, "low" means real but often noise-bound.

TacticSetup timeOngoing costTypical citation liftMeasurement signalBest for
Direct Answer block (≤120 words)30 min/pageNoneHighAI mention frequencyEvery page, no exceptions
Article + FAQPage + HowTo schema1 hr setup, 5 min/pageNoneHighRich Results test passLong-form posts
llms.txt at site root30 minNone (re-touch quarterly)MediumNone direct; correlates with crawl frequencyWhole site, once
Brand entity disambiguation (4+ sameAs)2 hrsNoneHighKnowledge Graph appearanceWhole brand, once
Inline primary-source citations15 min/pageNoneMediumOutbound link auditEducational posts
One canonical URL per concept (no duplicates)4 hrs site auditNoneMediumCannibalization tools (GSC)Sites with 30+ posts
Server-side AI-referrer detection30 min$0-29/moNone on citations; full lift on revenue attributionFirst-party source captureAnyone with paid traffic

Two readings of this table. First, the four high-lift moves, Direct Answer, schema, entity, canonical-per-concept, are all free. The vendor market for GEO is mostly selling you tools to do work that takes hours, not dollars. Second, the bottom row, server-side detection, is the only tactic that does not directly increase citations but uniquely closes the measurement loop. Without it the other six are running blind.

For SaaS marketers planning content, the marketing attribution for SaaS overview slots GEO into the broader channel mix, and the which-backlinks-drive-revenue analysis covers the related question of which referring domains actually translate to paid signups (RPV varies 5-30x across channels, which makes the question non-trivial).

How fast each tactic actually shows up (the citation decay & onset table)

Most GEO playbooks treat the seven tactics above as if they all have the same time-to-effect. They do not. Perplexity's live retrieval index refreshes within hours of a new crawl; ChatGPT's search index refreshes on a slower cadence; Claude's RAG corpus is the slowest of the chat assistants; and anything that depends on training-corpus inclusion is on a multi-month-to-year clock. The table below is the empirical onset window I have observed across 32 Attrifast posts and three client SaaS sites between November 2025 and May 2026.

TacticPerplexity onsetChatGPT search onsetClaude web onsetAIO onsetTraining-corpus effect
Direct Answer block added2-7 days7-21 days14-30 days14-45 daysnext model cycle (3-12 months)
FAQPage schema added3-10 days14-30 days21-45 days14-30 daysnext model cycle
HowTo schema added3-10 days14-30 days21-45 days14-30 daysnext model cycle
llms.txt published7-14 days (PerplexityBot recrawl)14-30 daysnot observedn/a (Google does not read)n/a
sameAs added to Organization14-30 days30-60 days30-60 days21-60 days (Google KG refresh)next model cycle
Inline primary-source citationsnext crawl (1-7 days)next crawl + index (7-21 days)next crawl + index (21-45 days)next crawl + index (14-30 days)next model cycle
Canonical consolidation (merge dupes)7-21 days (until old URL drops)21-60 days30-60 days30-90 daysnext model cycle

Two things to read from this table. First, Perplexity is the engine where you will see signal first — if your changes do not move the needle on Perplexity within 30 days, the chance they move ChatGPT or AIO is low. Treat Perplexity as your leading indicator. Second, the "training-corpus effect" column is the long bet: structural changes you ship today are what will get you cited in the next foundation model trained, which for Anthropic and OpenAI runs on roughly a 6-12 month cycle per the public release cadence of Claude 3.5 → 4.x and GPT-4 → GPT-5. Per the Stanford CRFM 2024 foundation model transparency index, training-data cutoffs for the major commercial models have lagged release date by 6-14 months. That lag is your runway: a post shipped in May 2026 with clean schema and a Direct Answer block is the kind of canonical artifact that ends up in the next pre-training pass.

A separate honest hedge: I have not directly observed an inclusion event in a foundation-model training corpus, because Anthropic and OpenAI do not publish per-URL inclusion data. The training-corpus column is inferred from (a) release-date vs. cutoff-date deltas, (b) the visible recency cap on what Claude and ChatGPT "know" about specific Attrifast posts, and (c) Common Crawl inclusion timing where the URL is publicly archived. Treat it as directional, not deterministic.

5 citation tactics that look great in blog posts but didn't move citations for me

The GEO advice market is roughly 40% directionally true, 40% directionally true but mis-prioritized, and 20% wrong in a way that wastes a quarter of your content budget. The five tactics below are ones I tried, measured carefully, and removed from the playbook because the citation signal was indistinguishable from noise. Listing them here saves you the same months of experimenting.

1. Generating 30+ FAQ items per post. The Ahrefs and Semrush studies found 4+ FAQ items correlated with citation. The reading some operators run with is "more is better." It is not. Between January and March 2026 I ran two Attrifast posts with 28 and 34 FAQPage entries respectively (the visible H3 FAQ block was that long). Citation rate across ChatGPT, Perplexity, and Claude did not move versus the 4-6-item baseline; Google flagged both posts with FAQ schema warnings about excessive mainEntity length, and one of the two saw a temporary drop in classic blue-link CTR (likely from over-stuffing the visible FAQ section pushing the body content below the fold on mobile). I rolled both back to 6 items in April. Diagnostic: if you have 4+ relevant FAQ items, ship them; if you are inventing questions to hit 20+, you are pattern-matching the form without the substance.

2. Hand-writing 5,000-word "ultimate guide" posts to chase length. Cited pages in the Ahrefs 2025 study averaged 1,800-2,400 words. I read that as "longer is better" for a quarter and shipped three posts in the 4,500-6,500 word range. Per-citation lift over the 1,800-word baseline was zero in any of the four citation engines I track. The likely reason: AI engines extract sections, not whole documents. A 6,000-word post is six 1,000-word sections concatenated; if no individual section is canonical-shaped, the whole document is not canonical-shaped. Per the Aggarwal et al. 2024 GEO paper from Princeton, the dominant citation factor in their controlled study was passage-level "quotation richness" and "statistic density," not document length. Diagnostic: write the natural length, never pad to a target word count.

3. Adding Speakable JSON-LD for voice assistant citation. I shipped Speakable schema on a half-dozen posts in early 2026 expecting it to feed Alexa, Google Assistant, and voice-mode ChatGPT. Per Google's documentation, Speakable is technically limited to news publishers and is currently only used in a small US English Google Assistant subset. Voice-mode ChatGPT does not parse it. After three months I had no observable voice-citation signal and removed it. Diagnostic: ship Speakable only if you are a news publisher with a CMS that auto-generates it; for everyone else, the markup is decorative.

4. Publishing a manifest of "AI-friendly" URLs at /.well-known/ai-plugin.json. This was the late-2023 OpenAI plugin manifest convention. It is now functionally dead — OpenAI deprecated plugins in favor of GPTs in early 2024 per the OpenAI plugin deprecation notice, and no current AI engine consumes the file. If you have one on your site, removing it has zero downside. Diagnostic: check for the file at https://yoursite.com/.well-known/ai-plugin.json; if present and not actively maintained, delete.

5. Aggressively re-writing prose to be "AI-friendly" at the sentence level. I spent two weeks in February 2026 rewriting six posts to use shorter sentences, more conversational phrasing, more "you" voice, more H3-as-question patterns inside body paragraphs. The structural changes (schema, Direct Answer, FAQ block at top) had already shipped on those posts. The prose-level rewrite added zero measurable citation lift across the four engines over a 60-day observation window, and one post saw a small drop in average time-on-page from human visitors (the prose felt choppy). The lesson: structure is what AI engines parse; prose is what humans read. Optimize each for its audience, do not collapse the two. Diagnostic: if your post has a Direct Answer block, FAQPage schema, and question-shaped H2s, the prose can be normal human prose.

What we did on attrifast.com (and what the numbers say so far)

Putting the playbook on our own site over the last 90 days:

  • Schema bundle on every blog post. Article + FAQPage + HowTo + Person + Organization. The /about page anchors the Person entity. We validated 32 posts against Google's Rich Results test and fixed three FAQ-mismatch warnings.
  • llms.txt published at the root. 1.1KB, 18 listed pages. Quarterly review.
  • sameAs audit. Brand has LinkedIn + X + GitHub + Crunchbase claimed. Founder Person entity has X (the rest is on my list).
  • Direct Answer block on every new post. Every article from March 2026 onward leads with a ≤120-word answer. We retro-fitted the top 8 trafficked posts in April.
  • Server-side AI-referrer detection. Our 4kb script tags chatgpt, perplexity, claude, and gemini referrers explicitly. Clean source attribution since week one.

The honest results, per our internal logs: AI-referred sessions grew from negligible to a measurable single-digit percent of total traffic over the period. Conversion rate from AI traffic to free trial sits roughly in line with organic search, slightly higher on educational queries, slightly lower on commercial-comparison queries, the variance is wide and the sample is still small. We measured channel-level revenue using our own attribution stack; the return-delay-penalty methodology page documents how we account for the gap between first AI-cited click and paid conversion (which can run 4-10 days for SaaS).

I will not show absolute numbers because n is too small to be useful (we are a bootstrapped SaaS, not Salesforce, and one viral mention skews the chart). What I will say: schema and Direct Answer were the two interventions where I felt the difference within 14 days. llms.txt and entity work paid off slowly. Server-side detection paid off the moment we shipped it because we stopped staring at "Direct" and shrugging.

The acknowledged failure: I spent two weeks earlier this year experimenting with "answer-engine-friendly" content rewrites at the prose level, shorter sentences, more numbered lists, more "what is X" headers. The structural changes (schema, Direct Answer, FAQ block) moved the needle. The prose-level rewrites did not show measurable signal above noise. Your time is better spent on structure than tone.

The 6-tactic month-by-month rollout on attrifast.com (Feb-May 2026)

Below is the per-month rollout I shipped between February and May 2026, alongside the citation-presence count from a fixed set of 25 target queries I run manually through each engine every Sunday. "Cited" means our domain appears as a linked source in the answer. Absolute numbers are small because n is small (single SaaS, 32 posts), but the directional deltas are real and the per-engine asymmetry is the part worth studying.

MonthTactic shippedPosts touchedChatGPT cited (of 25)Perplexity cited (of 25)Claude cited (of 25)AIO cited (of 25)
Jan 2026 (baseline)none02411
Feb 2026Article + FAQPage + HowTo JSON-LD on all posts243721
Mar 2026Direct Answer block (≤120w) on all posts245932
Mar 2026llms.txt published at root, 18 URLs listedsite-level51132
Apr 2026Organization.sameAs expanded to 5 profilessite-level71243
Apr 2026Canonical consolidation (3 duplicate clusters merged)9 posts removed81344
May 2026Inline primary-source citations density doubled12 top posts91455

Reading the table: the largest single-month jump on Perplexity came from publishing llms.txt (March, +2 cited). The largest single-month jump on ChatGPT came from the sameAs expansion plus canonical consolidation (April, +3 cited combined). AIO is the laggiest of the four engines, which matches the citation-decay table — Google's classifier seems to want both top-3 rank and full structural hygiene before it cites, so the lift from any single tactic is smaller. Claude moves the least overall, which is consistent with Anthropic's more conservative web-search citation pattern.

The honest caveat: 25 target queries is a small panel. The deltas above should be read as "directional, consistent with the cited research" rather than "statistically significant." A larger panel would be more credible; we are running it monthly and the table above is what I would defend in a conversation with another founder, not what I would defend in front of a paid SEO research audience.

A second caveat I have to flag: the canonical-consolidation row (April) is partly confounded with the sameAs change because both shipped in the same week. The two effects are not cleanly separable in this data. If you want a clean A/B, ship one structural change per month and wait 30 days before adding the next.

Limitations

  • This article does not cover Bing Chat / Copilot citation behavior in detail. Bing's index is closer to traditional search, and the GEO playbook is roughly the same as classic SEO with FAQ schema layered on.
  • It does not cover paid AI-platform placements (e.g., OpenAI's enterprise partnerships, Perplexity's ad surfaces). Those are sales conversations, not GEO.
  • Voice-assistant citation (Alexa, Google Assistant) is a separate ecosystem with its own structured-data requirements.
  • Multilingual GEO is still early; most of the cited research is English-language. If you are publishing in non-English markets, the structural rules likely translate but the empirical lift estimates may not.
  • This article does not measure brand-mention-without-link lift quantitatively because the data is hard to get without enterprise tooling. We track it qualitatively via brand-search GSC trends.

FAQ

How do I actually get ChatGPT to cite my website?

Three things in priority order: structured data (Article + FAQPage + HowTo JSON-LD with at least 4 FAQ items), llms.txt at your site root listing your canonical pages, and entity disambiguation (matching Person and Organization sameAs links across LinkedIn, GitHub, X, Crunchbase). Content quality matters but is necessary, not sufficient. Pages with all three structural signals are roughly 3x more likely to be cited than equivalent pages with content alone, per 2025-2026 Ahrefs and Semrush GEO research.

Does llms.txt actually do anything in 2026?

Yes, but not what most operators think. llms.txt is not a ranking signal the way robots.txt is a crawl directive. It's a curated index of your most LLM-relevant pages, and ChatGPT's and Perplexity's crawlers do read it when present. Adoption sits near 7% of public SaaS sites as of Q1 2026. The cost is roughly 30 minutes to write, the lift is meaningful for sites where your most useful pages are not your most-linked pages, and it costs nothing to leave running.

Why doesn't GA4 show me ChatGPT and Perplexity traffic?

Two reasons. First, AI engines often strip the Referer header on outbound clicks, so the visit lands as 'Direct/(none)' in any tool that relies on referrer parsing. Second, GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, or claude.ai (and many AI clients open links in a new tab without referrer at all). The result is roughly 100% misattribution of AI-referred sessions in default GA4. Server-side first-party tracking with explicit AI-source detection recovers most of it.

What single change moves the needle most for AI citations?

Adding FAQPage schema with at least 4 question-answer pairs that exactly match the visible H2 FAQ block on the page. Ahrefs and Semrush GEO studies through 2025 consistently found AI-cited pages averaged 4 or more FAQ schema items versus 1-2 on uncited pages. The reason is mechanical, FAQ schema gives the LLM training pipeline pre-extracted question-answer pairs that match how users actually phrase queries to ChatGPT and Perplexity.

Can I measure whether GEO is working without a paid tool?

Partially. You can manually query ChatGPT, Perplexity, and Claude for your target topics weekly and log whether your domain is cited. That captures presence but not traffic. For traffic, you need either server-side analytics that detects AI source headers and known AI client IPs, or a first-party tracker that fingerprints AI-referrer patterns. GA4 alone will not tell you. Most operators learn this the hard way after publishing 6-12 GEO-optimized pages.

Related reading from the Attrifast research stack

For the measurement layer Attrifast actually ships, see first-party UTM-to-revenue tracking and the sibling deep-dive on AI traffic revenue attribution. Attrifast does not do GEO, generate schema, or run a citation tracker — citation logging stays a manual weekly task, as covered above.

Sources

Related reading

GEO Strategy21 min
How to Show Up in Perplexity: The 2026 Citation Playbook
A 2026 how-to playbook for getting cited by Perplexity AI — why it is citation-first and freshness-weighted, the 8-step citeability checklist, and how to measure if Perplexity citations drive Stripe revenue.
GEO Strategy21 min
llms.txt Explained: Does It Actually Improve AI Visibility and Revenue in 2026?
A skeptical 2026 deep-dive on llms.txt: what the spec actually is, who reads it, whether it changes AI citations, and how to measure the revenue lift yourself instead of trusting vendor hype.
GEO Strategy29 min
Is llms.txt Worth It? A 10-Site, 6-Week Controlled Experiment (2026 Data)
I ran a real matched-pair experiment: 5 sites shipped llms.txt, 5 did not, citations tracked weekly for 6 weeks across ChatGPT, Claude, Perplexity, and Gemini. Here is the actual delta.
Guide30 min
Answer Engine Optimization (AEO): The Complete 2026 Guide
Answer engine optimization is the practice of structuring content so AI engines like ChatGPT, Perplexity, Claude, and Google AI Overviews cite it in their answers. This founder-tested guide covers what AEO is, how it differs from SEO, the ranking factors that matter, a step-by-step playbook, the tools, and how to measure whether it drives revenue.
Competitive Analysis29 min
How to Analyze Your Competitors' AI Visibility (and Beat Them in 2026)
A step-by-step method to analyze why ChatGPT, Perplexity, Claude and Gemini recommend your competitors over you — build a buying-query prompt set, tally per-competitor share of voice, teardown their citation sources, then close the gaps that actually drive your revenue.

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

5-day free trial · $29/mo · cancel anytime