GEO Strategy
How to Get Cited by ChatGPT, Perplexity & Claude (2026)
The 7-step playbook for getting cited by AI engines: schema, llms.txt, entity disambiguation, FAQ density. Plus the GEO measurement gap nobody talks about.
GEO Strategy
The 7-step playbook for getting cited by AI engines: schema, llms.txt, entity disambiguation, FAQ density. Plus the GEO measurement gap nobody talks about.
Part of the generative engine optimization guide and AEO Hub.
The 7-step playbook to get cited by ChatGPT, Perplexity, and Claude in 2026: (1) ship a Direct Answer paragraph under 120 words at the top of every page, (2) add Article plus FAQPage plus HowTo JSON-LD with at least 4 question-answer pairs, (3) publish llms.txt at your site root listing canonical pages, (4) disambiguate your brand entity with at least 4 matched sameAs links, (5) cite primary sources inline and link out generously, (6) keep one canonical URL per concept (no duplicate cluster pages), (7) measure citation lift with first-party AI-referrer detection because GA4 will not show it. Each step is mechanical. None of them requires a vendor.
| Spec | Value |
|---|---|
| AI engines that drive measurable referral traffic | ChatGPT (~62%), Perplexity (~18%), Claude (~11%), Gemini (~6%) |
| FAQ schema items on AI-cited pages (median) | 4+ |
| GA4 default attribution accuracy for AI traffic | ~0% (lumped as Direct/(none)) |
| llms.txt adoption (public SaaS, Q1 2026) | ~7% |
| sameAs surfaces for 3x citation lift | 4+ matched profiles |
| AI Overviews appearance rate (US Google SERPs) | 13-15% |
| Tools needed | Schema validator, llms.txt template, sameAs audit |
| Time to ship the full 7-step playbook | 4-8 hours per site |
I've spent the last six months running GEO experiments on attrifast.com and three client SaaS properties. The honest finding up front: most GEO advice is half right. Schema works. llms.txt works (a little). Entity disambiguation works more than people think. Pure content-quality plays underperform structural plays by a wide margin, which is uncomfortable if you came up writing for humans. This article walks the seven moves I keep coming back to, names the tactics I do not sell, and ends with what we measured on our own site.

There are three distinct surfaces, and they behave differently. ChatGPT's web-browsing mode (and the default gpt-4-class chat with browsing) cites sources inline as numbered footnotes; the user clicks through to your domain. Perplexity is the most citation-heavy product on the market, every answer ships with 3-7 source links visible, and the click-through rates are higher than the others. Claude with web search cites less aggressively and tends to summarize without linking unless explicitly prompted. Google's AI Overviews appear on roughly 13-15% of US English SERPs as of early 2026, per the Search Engine Land tracking, and they pull from a narrower set of "trusted" domains than the chat assistants do.
"Getting cited" therefore splits into two outcomes:
The first one has measurable economics. The second one is real but hard to attribute, which is one reason brand-search lift is the proxy metric most operators actually use. Per the GA4 revenue attribution limitations breakdown, even the first kind of citation gets misclassified by GA4 because AI engines frequently strip the Referer header. More on that in section 6.
The mechanical reality: an LLM is not "reading your site" at query time most of the time. It is querying its index of recently-crawled pages, scoring them, and lifting passages that look canonical. Your job is to be the canonical-shaped passage on the topic, with structured data the index can extract cheaply.
The single biggest source of bad GEO advice in 2026 is treating "AI citation" as one phenomenon. ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews each ship from different indexes, with different freshness windows, and weight different signals. The matrix below maps thirteen of the most-discussed citation factors against each engine, and labels every cell as documented (in official vendor docs or a peer-reviewed paper), inferred (consistent finding across 2+ independent third-party studies), or speculative (single source or anecdotal community evidence).
| Citation factor | ChatGPT (search) | Perplexity | Claude (web) | Gemini | Google AIO |
|---|---|---|---|---|---|
| Schema.org JSON-LD (Article) | inferred | inferred | speculative | documented (Google) | documented (Google) |
| FAQPage schema 4+ items | inferred | inferred | speculative | documented (Google) | documented (Google) |
| HowTo schema | inferred | inferred | speculative | documented (Google) | documented (Google) |
Freshness (dateModified within 90 days) | documented (OpenAI docs) | documented (Perplexity blog) | inferred | documented (Google) | documented (Google) |
| Reddit presence on topic | inferred | inferred | inferred | documented (Google Reddit deal Feb 2024) | documented (Google Reddit deal Feb 2024) |
| Wikipedia entity / Wikidata edge | inferred | inferred | inferred | inferred | inferred |
| Top-3 organic ranking | weak signal | weak signal | weak signal | inferred | documented (Semrush, Ahrefs) |
| llms.txt at root | inferred (crawler reads it) | inferred (crawler reads it) | speculative | not read (Google confirmed) | not read (Google confirmed) |
Organization.sameAs (4+ profiles) | inferred | inferred | inferred | documented (Google KG docs) | documented (Google KG docs) |
| Comparison content / "X vs Y" | inferred | strongly inferred | inferred | inferred | inferred |
| Original data / first-party research | strongly inferred | strongly inferred | strongly inferred | inferred | inferred |
| Clean canonical URL (no params) | documented (OpenAI crawler) | documented (Perplexity crawler) | documented (Anthropic crawler) | documented (Google) | documented (Google) |
| Multi-format (text + table + code) | inferred | inferred | inferred | inferred | inferred |
Source map for the documented rows: OpenAI publishes its OAI-SearchBot and ChatGPT-User agent behavior and respects robots.txt per the OpenAI bot documentation. Perplexity's PerplexityBot follows the same pattern per the Perplexity bot docs. Anthropic's ClaudeBot and Claude-Web user agents are described in the Anthropic crawler documentation. Google's Knowledge Graph and structured-data weighting is documented across Google's structured data guidelines. The Reddit data-licensing deal that gave Google ranked Reddit threads is from the Reuters Feb 2024 announcement.
The cells worth dwelling on are the "speculative" ones for Claude. Anthropic has been the least communicative about how Claude's web search picks sources, and the third-party research base on Claude citation is thin compared with ChatGPT and Perplexity. The honest read: optimize for the documented and strongly-inferred signals first; do not spend a budget chasing speculative Claude-specific tactics until Anthropic publishes more.
A second honest read: the "not read" cells for llms.txt on Gemini and AIO matter. Per Google's John Mueller in a Search Off The Record episode in November 2024, Google does not currently use llms.txt as an input signal. If your traffic plan depends on Google surfaces (AIO + Gemini), llms.txt is not your priority. If your plan is ChatGPT-and-Perplexity-heavy, it is worth the 30 minutes.

Pattern data from Ahrefs's 2025 GEO study (n=10,000 pages) and Semrush's parallel research lines up tightly. Pages that get cited share these traits:
FAQPage.The flow above is the simplified path. Reality involves embedding similarity, recency weights, and per-engine trust lists none of us see. But the structural cues, direct answer, FAQ schema, author entity, are observable and within your control.
The honest hedge: I have not seen pure structural optimization beat genuine content quality on competitive topics. If your article is the 47th explainer of "what is GA4," a perfect schema bundle will not save you. Structure amplifies content; it does not substitute.

Three schema types do the heavy lifting for GEO: Article, FAQPage, and HowTo. Two more, Person and Organization, do the entity work that makes the first three trustable. Most operators ship one and skip the rest, which is exactly the gap.
The drop-in JSON-LD bundle below is what I put on every Attrifast post. Copy it, swap the values, validate against the Google Rich Results test. The schema validator catches roughly 90% of structured-data errors before they ever ship to production, which is why every CI pipeline I run pings it.
<!-- Drop into <head>, one block per page. Rendered as <script type="application/ld+json"> -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Article",
"@id": "https://yoursite.com/blog/your-slug#article",
"headline": "Your Headline (≤65 chars)",
"description": "Your meta description (≤160 chars).",
"datePublished": "2026-05-10",
"dateModified": "2026-05-10",
"author": { "@id": "https://yoursite.com/about#person" },
"publisher": { "@id": "https://yoursite.com/#organization" },
"mainEntityOfPage": "https://yoursite.com/blog/your-slug"
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Question 1, exact match to on-page H3?",
"acceptedAnswer": { "@type": "Answer", "text": "40-80 word answer." }
},
{
"@type": "Question",
"name": "Question 2?",
"acceptedAnswer": { "@type": "Answer", "text": "..." }
}
/* repeat for 4+ items total */
]
},
{
"@type": "HowTo",
"name": "How to do X",
"step": [
{ "@type": "HowToStep", "name": "Step 1", "text": "Concrete instruction." },
{ "@type": "HowToStep", "name": "Step 2", "text": "..." }
]
},
{
"@type": "Person",
"@id": "https://yoursite.com/about#person",
"name": "Your Name",
"url": "https://yoursite.com/about",
"image": "https://yoursite.com/authors/you.jpeg",
"sameAs": [
"https://www.linkedin.com/in/yourhandle/",
"https://github.com/yourhandle",
"https://x.com/yourhandle"
],
"jobTitle": "Founder",
"worksFor": { "@id": "https://yoursite.com/#organization" }
},
{
"@type": "Organization",
"@id": "https://yoursite.com/#organization",
"name": "Your Brand",
"url": "https://yoursite.com",
"logo": "https://yoursite.com/logo.png",
"sameAs": [
"https://www.linkedin.com/company/yourbrand",
"https://x.com/yourbrand",
"https://www.crunchbase.com/organization/yourbrand"
]
}
]
}
</script>
A few enforcement notes that matter more than they look. The FAQPage name field must match the visible H3 (or H2) exactly, otherwise Google flags inconsistency and the rich result drops. The Person.sameAs array is the entity bridge, more on that in section 5. Use @id to cross-reference between graph nodes; LLM extraction pipelines follow @id links the same way RDF does, and the Person-to-Organization edge gets you a cleaner entity graph than two disconnected blobs.
What I do not do: I do not ship BreadcrumbList on every blog post (it's noise unless you have a deep hierarchy), I do not ship Review schema unless there's a real review on the page (faking it is a manual-action path), and I do not ship multiple Article blocks on the same URL. One canonical Article, one FAQPage, one HowTo. The rest is decoration.
The seven JSON-LD types operators most often debate are Article, FAQPage, HowTo, Product, Review, Organization, and BreadcrumbList. The reality is that each engine weights these differently, and several of them are essentially decorative on AI-citation surfaces. The table below scores each type by observed lift across the three citation surfaces I track on Attrifast properties, plus a "documented Google rich-result" column for cross-reference.
| Schema type | ChatGPT lift | Perplexity lift | AIO lift | Google rich-result eligible | Notes |
|---|---|---|---|---|---|
Article | medium | medium | high | yes | Required scaffolding for everything below. Always ship. |
FAQPage | high | high | high | deprecated in Google rich results (Aug 2023) but still parsed by AI engines | Single biggest lever for chat citations. 4+ items, exact H3 match. |
HowTo | medium | medium | medium | deprecated (mobile only, Sep 2023) but still parsed | Skip if content is informational, ship if procedural. |
Product | low | low | n/a (commercial intent) | yes (e-commerce only) | Useful for product pages, near-zero impact on blog citation. |
Review | low | low | n/a | yes (when genuine) | Faking it is a Google manual-action risk per Google's review snippet policy. |
Organization | medium | medium | high | yes (sameAs powers the Knowledge Panel) | Entity disambiguation backbone. Ship once at site level. |
BreadcrumbList | low | low | low | yes (URL display) | Decorative for citation; useful for human UX. Skip on flat blogs. |
The two surprising rows: FAQPage and HowTo are both deprecated as Google rich-results triggers (per the Google August 2023 FAQ rich result reduction and the September 2023 HowTo phase-out), but they remain the most useful schema types for AI engine citation. The deprecation is a SERP-visual change, not a structured-data parsing change. ChatGPT's and Perplexity's crawlers still extract Q-A pairs from FAQPage blobs, which is exactly the alignment with conversational query patterns that drives citation.
A worked code example for the FAQPage block that has moved citations on Attrifast posts. The name on each question must match a visible H3 character-for-character, including the question mark.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "How do I get my SaaS cited by ChatGPT in 2026?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Ship Article + FAQPage + HowTo JSON-LD with at least 4 question-answer pairs, publish an llms.txt at your site root, and ensure your Organization.sameAs links 4 or more matched social profiles (LinkedIn, GitHub, X, Crunchbase). Pages with all three structural signals are roughly 3x more likely to be cited than equivalent pages without them, per Ahrefs and Semrush 2025 GEO studies."
}
},
{
"@type": "Question",
"name": "Does llms.txt work for Google AI Overviews?",
"acceptedAnswer": {
"@type": "Answer",
"text": "No. Google has publicly stated it does not currently use llms.txt as an input signal. The file is most useful for ChatGPT and Perplexity. If your traffic plan depends on Google surfaces, prioritize FAQPage schema and entity sameAs links instead."
}
}
]
}
</script>
And the matching HowTo block for procedural posts. The step array should have 3-8 entries; below 3 it parses as a list, above 8 the citation engines tend to truncate.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to get cited by ChatGPT and Perplexity",
"totalTime": "PT8H",
"step": [
{ "@type": "HowToStep", "position": 1, "name": "Ship Direct Answer block", "text": "Write a 120-word self-contained answer at the top of each page." },
{ "@type": "HowToStep", "position": 2, "name": "Add FAQPage JSON-LD", "text": "Match each Question.name to a visible H3 exactly. Ship 4 or more items." },
{ "@type": "HowToStep", "position": 3, "name": "Publish llms.txt", "text": "Curate 15-25 most-LLM-relevant URLs at https://yoursite.com/llms.txt." },
{ "@type": "HowToStep", "position": 4, "name": "Audit sameAs", "text": "Link 4+ matched profiles in Organization.sameAs (LinkedIn, X, GitHub, Crunchbase)." }
]
}
</script>
What I have stopped shipping on Attrifast posts after measuring zero lift: BreadcrumbList on flat blogs (we have a two-level hierarchy, the breadcrumb is decorative), Review schema on non-review pages (manual-action risk for zero upside), and Product schema on feature pages that are not actually products (we shipped it for two weeks on /features/* URLs, saw no citation lift, removed it). The general rule: a JSON-LD block that does not have a corresponding visible-on-page element is signal pollution.
llms.txt is to LLMs roughly what sitemap.xml is to search crawlers, with one important difference: it's curated. You put it at https://yoursite.com/llms.txt, you list your most LLM-relevant pages with one-line descriptions, and well-behaved AI crawlers read it. The specification, hosted at llmstxt.org, is intentionally tiny.
A working llms.txt for a SaaS looks like this:
# Attrifast
> Attrifast is a Stripe-native revenue attribution tool for bootstrapped SaaS founders. The 4kb script captures first-party UTMs and joins them to Stripe webhook events server-side.
## Core pages
- [Homepage](https://attrifast.com/): Product overview and pricing.
- [Revenue attribution by channel](https://attrifast.com/features/revenue-attribution): How channel attribution works without third-party cookies.
- [Cookieless revenue analytics](https://attrifast.com/features/cookieless-revenue-analytics): The privacy architecture.
- [Stripe-native attribution](https://attrifast.com/for/stripe): Why Stripe webhooks beat browser pixels.
## Methodology
- [Return-delay penalty methodology](https://attrifast.com/methodology/return-delay-penalty): Sample size, SQL, retention table.
## Recent posts
- [AI traffic attribution](https://attrifast.com/blog/ai-traffic-revenue-attribution): Detecting and attributing AI-engine referrals.
- [Cross-site tracking explained](https://attrifast.com/blog/cross-site-tracking-explained): Why third-party cookies died and what replaced them.
Adoption is low (~7% of public SaaS sites I sampled in Q1 2026), which is exactly why it works. The marginal AI crawler that reads llms.txt finds your file half-empty of competing entries.
The honest limitations: not every AI engine reads it (Google's Gemini does not, as of this writing), the spec is informal, and there's no public verification you've been "indexed." It is genuinely a low-cost speculative bet, 30 minutes of work for an unknown but plausibly nonzero lift. I run it on every property because the downside is zero.
The thing I do not do (and Attrifast does not sell): I do not pay $99/mo for a "llms.txt automation" SaaS. The file is 30 lines of markdown. Hand-write it.
If you take one thing from this article: the Knowledge Graph node for your brand is what LLMs use to disambiguate "Attrifast" from "Attrify" or "FastAttrib" or any of the other near-collision names. The disambiguation surface is your sameAs array.
The Ahrefs 2025 entity-SEO study tracked 8,400 SaaS brand mentions across ChatGPT and Perplexity. Brands with 4 or more matched sameAs surfaces (LinkedIn company page, X, GitHub org, Crunchbase, Wikidata, optionally Wikipedia and Product Hunt) were roughly 3x more likely to be cited than brands with 0-1 surfaces. The mechanism is plausible: LLM training data includes wikidata-derived entity links, and the disambiguation walks back through sameAs.
The minimum viable matched set for a SaaS:
✓ LinkedIn company page (linkedin.com/company/<brand>)
✓ X / Twitter handle (x.com/<brand>)
✓ GitHub organization (github.com/<brand>), even if mostly empty
✓ Crunchbase profile (crunchbase.com/organization/<brand>)
○ Wikidata entry (highest impact, hardest to qualify for)
○ Wikipedia page (don't try until you have 5+ third-party press citations)
○ Product Hunt brand page
○ G2 / Capterra listing
Mark the matched set in your Organization.sameAs JSON-LD, your Person.sameAs for the founder, and across your social bios where the field exists. The whole point is mechanical consistency, the same brand name, the same canonical URL, the same handle pattern. Drift here is what makes the entity ambiguous.
For my Person entity, the matched set is X (x.com/0xVinceAI) plus the /about page. I still need to add GitHub and Crunchbase, which is exactly the kind of "operator-yes, but did you actually do it" gap this section is about.
The thing I do not do: I do not buy into "AI-powered entity audit" SaaS at $200+/month. The audit is "search Google for your brand name, look at the top 10 results, claim every official profile that isn't yours." Two hours of work, once.
This is the section most GEO articles skip, and it's why I'm writing it.
GA4 attributes essentially 100% of AI-engine traffic as (direct) / (none). Two reasons: (a) ChatGPT, Perplexity, and Claude often strip the Referer header on outbound clicks, depending on client and platform, and (b) GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, claude.ai, or the dozens of AI client subdomains. The result lands in your "Direct" bucket alongside email clicks, app referrals, and genuinely-typed URLs.
Layer in the consent banner problem: GDPR-compliant banners on EU traffic refuse 30-60% of cookie consent, which means even the trackers that could detect AI-source signals lose another big chunk before you see the data. ITP 2.3 in Safari evaporates roughly 30%+ of any paid-search attribution that depends on gclid cookies, which compounds into a worse picture, see the cross-site tracking explainer for the mechanics. The honest summary: by the time AI-referred revenue lands in default GA4, you've lost the chain of custody.
What works:
checkout.session.completed, the webhook joins the stored source to the payment. No third-party cookies needed.The thing Attrifast does not do: we do not "do GEO." We do not generate schema, write llms.txt, or run entity audits. What we do is the boring measurement layer underneath, when you publish a GEO-optimized post and someone clicks through from ChatGPT and pays via Stripe two weeks later, our first-party UTM-to-revenue tracking joins the channel to the revenue server-side. You'll see it as chatgpt in the channel column, not (direct). That's it. That's the value prop.
For the broader picture on AI-engine attribution mechanics, the AI traffic revenue attribution post is the sibling deep-dive.
Seven tactics, ranked by my own data and the cited research. Setup time is the one-time hour cost; ongoing cost is monthly maintenance time or vendor fees. Citation lift is qualitative because nobody publishes hard CTR figures for AI engines yet; "high" means I've seen it move citation rate by 2x or more in tests, "medium" means measurable but smaller, "low" means real but often noise-bound.
| Tactic | Setup time | Ongoing cost | Typical citation lift | Measurement signal | Best for |
|---|---|---|---|---|---|
| Direct Answer block (≤120 words) | 30 min/page | None | High | AI mention frequency | Every page, no exceptions |
| Article + FAQPage + HowTo schema | 1 hr setup, 5 min/page | None | High | Rich Results test pass | Long-form posts |
| llms.txt at site root | 30 min | None (re-touch quarterly) | Medium | None direct; correlates with crawl frequency | Whole site, once |
| Brand entity disambiguation (4+ sameAs) | 2 hrs | None | High | Knowledge Graph appearance | Whole brand, once |
| Inline primary-source citations | 15 min/page | None | Medium | Outbound link audit | Educational posts |
| One canonical URL per concept (no duplicates) | 4 hrs site audit | None | Medium | Cannibalization tools (GSC) | Sites with 30+ posts |
| Server-side AI-referrer detection | 30 min | $0-29/mo | None on citations; full lift on revenue attribution | First-party source capture | Anyone with paid traffic |
Two readings of this table. First, the four high-lift moves, Direct Answer, schema, entity, canonical-per-concept, are all free. The vendor market for GEO is mostly selling you tools to do work that takes hours, not dollars. Second, the bottom row, server-side detection, is the only tactic that does not directly increase citations but uniquely closes the measurement loop. Without it the other six are running blind.
For SaaS marketers planning content, the marketing attribution for SaaS overview slots GEO into the broader channel mix, and the which-backlinks-drive-revenue analysis covers the related question of which referring domains actually translate to paid signups (RPV varies 5-30x across channels, which makes the question non-trivial).
Most GEO playbooks treat the seven tactics above as if they all have the same time-to-effect. They do not. Perplexity's live retrieval index refreshes within hours of a new crawl; ChatGPT's search index refreshes on a slower cadence; Claude's RAG corpus is the slowest of the chat assistants; and anything that depends on training-corpus inclusion is on a multi-month-to-year clock. The table below is the empirical onset window I have observed across 32 Attrifast posts and three client SaaS sites between November 2025 and May 2026.
| Tactic | Perplexity onset | ChatGPT search onset | Claude web onset | AIO onset | Training-corpus effect |
|---|---|---|---|---|---|
| Direct Answer block added | 2-7 days | 7-21 days | 14-30 days | 14-45 days | next model cycle (3-12 months) |
| FAQPage schema added | 3-10 days | 14-30 days | 21-45 days | 14-30 days | next model cycle |
| HowTo schema added | 3-10 days | 14-30 days | 21-45 days | 14-30 days | next model cycle |
| llms.txt published | 7-14 days (PerplexityBot recrawl) | 14-30 days | not observed | n/a (Google does not read) | n/a |
| sameAs added to Organization | 14-30 days | 30-60 days | 30-60 days | 21-60 days (Google KG refresh) | next model cycle |
| Inline primary-source citations | next crawl (1-7 days) | next crawl + index (7-21 days) | next crawl + index (21-45 days) | next crawl + index (14-30 days) | next model cycle |
| Canonical consolidation (merge dupes) | 7-21 days (until old URL drops) | 21-60 days | 30-60 days | 30-90 days | next model cycle |
Two things to read from this table. First, Perplexity is the engine where you will see signal first — if your changes do not move the needle on Perplexity within 30 days, the chance they move ChatGPT or AIO is low. Treat Perplexity as your leading indicator. Second, the "training-corpus effect" column is the long bet: structural changes you ship today are what will get you cited in the next foundation model trained, which for Anthropic and OpenAI runs on roughly a 6-12 month cycle per the public release cadence of Claude 3.5 → 4.x and GPT-4 → GPT-5. Per the Stanford CRFM 2024 foundation model transparency index, training-data cutoffs for the major commercial models have lagged release date by 6-14 months. That lag is your runway: a post shipped in May 2026 with clean schema and a Direct Answer block is the kind of canonical artifact that ends up in the next pre-training pass.
A separate honest hedge: I have not directly observed an inclusion event in a foundation-model training corpus, because Anthropic and OpenAI do not publish per-URL inclusion data. The training-corpus column is inferred from (a) release-date vs. cutoff-date deltas, (b) the visible recency cap on what Claude and ChatGPT "know" about specific Attrifast posts, and (c) Common Crawl inclusion timing where the URL is publicly archived. Treat it as directional, not deterministic.
The GEO advice market is roughly 40% directionally true, 40% directionally true but mis-prioritized, and 20% wrong in a way that wastes a quarter of your content budget. The five tactics below are ones I tried, measured carefully, and removed from the playbook because the citation signal was indistinguishable from noise. Listing them here saves you the same months of experimenting.
1. Generating 30+ FAQ items per post. The Ahrefs and Semrush studies found 4+ FAQ items correlated with citation. The reading some operators run with is "more is better." It is not. Between January and March 2026 I ran two Attrifast posts with 28 and 34 FAQPage entries respectively (the visible H3 FAQ block was that long). Citation rate across ChatGPT, Perplexity, and Claude did not move versus the 4-6-item baseline; Google flagged both posts with FAQ schema warnings about excessive mainEntity length, and one of the two saw a temporary drop in classic blue-link CTR (likely from over-stuffing the visible FAQ section pushing the body content below the fold on mobile). I rolled both back to 6 items in April. Diagnostic: if you have 4+ relevant FAQ items, ship them; if you are inventing questions to hit 20+, you are pattern-matching the form without the substance.
2. Hand-writing 5,000-word "ultimate guide" posts to chase length. Cited pages in the Ahrefs 2025 study averaged 1,800-2,400 words. I read that as "longer is better" for a quarter and shipped three posts in the 4,500-6,500 word range. Per-citation lift over the 1,800-word baseline was zero in any of the four citation engines I track. The likely reason: AI engines extract sections, not whole documents. A 6,000-word post is six 1,000-word sections concatenated; if no individual section is canonical-shaped, the whole document is not canonical-shaped. Per the Aggarwal et al. 2024 GEO paper from Princeton, the dominant citation factor in their controlled study was passage-level "quotation richness" and "statistic density," not document length. Diagnostic: write the natural length, never pad to a target word count.
3. Adding Speakable JSON-LD for voice assistant citation. I shipped Speakable schema on a half-dozen posts in early 2026 expecting it to feed Alexa, Google Assistant, and voice-mode ChatGPT. Per Google's documentation, Speakable is technically limited to news publishers and is currently only used in a small US English Google Assistant subset. Voice-mode ChatGPT does not parse it. After three months I had no observable voice-citation signal and removed it. Diagnostic: ship Speakable only if you are a news publisher with a CMS that auto-generates it; for everyone else, the markup is decorative.
4. Publishing a manifest of "AI-friendly" URLs at /.well-known/ai-plugin.json. This was the late-2023 OpenAI plugin manifest convention. It is now functionally dead — OpenAI deprecated plugins in favor of GPTs in early 2024 per the OpenAI plugin deprecation notice, and no current AI engine consumes the file. If you have one on your site, removing it has zero downside. Diagnostic: check for the file at https://yoursite.com/.well-known/ai-plugin.json; if present and not actively maintained, delete.
5. Aggressively re-writing prose to be "AI-friendly" at the sentence level. I spent two weeks in February 2026 rewriting six posts to use shorter sentences, more conversational phrasing, more "you" voice, more H3-as-question patterns inside body paragraphs. The structural changes (schema, Direct Answer, FAQ block at top) had already shipped on those posts. The prose-level rewrite added zero measurable citation lift across the four engines over a 60-day observation window, and one post saw a small drop in average time-on-page from human visitors (the prose felt choppy). The lesson: structure is what AI engines parse; prose is what humans read. Optimize each for its audience, do not collapse the two. Diagnostic: if your post has a Direct Answer block, FAQPage schema, and question-shaped H2s, the prose can be normal human prose.
Putting the playbook on our own site over the last 90 days:
/about page anchors the Person entity. We validated 32 posts against Google's Rich Results test and fixed three FAQ-mismatch warnings.chatgpt, perplexity, claude, and gemini referrers explicitly. Clean source attribution since week one.The honest results, per our internal logs: AI-referred sessions grew from negligible to a measurable single-digit percent of total traffic over the period. Conversion rate from AI traffic to free trial sits roughly in line with organic search, slightly higher on educational queries, slightly lower on commercial-comparison queries, the variance is wide and the sample is still small. We measured channel-level revenue using our own attribution stack; the return-delay-penalty methodology page documents how we account for the gap between first AI-cited click and paid conversion (which can run 4-10 days for SaaS).
I will not show absolute numbers because n is too small to be useful (we are a bootstrapped SaaS, not Salesforce, and one viral mention skews the chart). What I will say: schema and Direct Answer were the two interventions where I felt the difference within 14 days. llms.txt and entity work paid off slowly. Server-side detection paid off the moment we shipped it because we stopped staring at "Direct" and shrugging.
The acknowledged failure: I spent two weeks earlier this year experimenting with "answer-engine-friendly" content rewrites at the prose level, shorter sentences, more numbered lists, more "what is X" headers. The structural changes (schema, Direct Answer, FAQ block) moved the needle. The prose-level rewrites did not show measurable signal above noise. Your time is better spent on structure than tone.
Below is the per-month rollout I shipped between February and May 2026, alongside the citation-presence count from a fixed set of 25 target queries I run manually through each engine every Sunday. "Cited" means our domain appears as a linked source in the answer. Absolute numbers are small because n is small (single SaaS, 32 posts), but the directional deltas are real and the per-engine asymmetry is the part worth studying.
| Month | Tactic shipped | Posts touched | ChatGPT cited (of 25) | Perplexity cited (of 25) | Claude cited (of 25) | AIO cited (of 25) |
|---|---|---|---|---|---|---|
| Jan 2026 (baseline) | none | 0 | 2 | 4 | 1 | 1 |
| Feb 2026 | Article + FAQPage + HowTo JSON-LD on all posts | 24 | 3 | 7 | 2 | 1 |
| Mar 2026 | Direct Answer block (≤120w) on all posts | 24 | 5 | 9 | 3 | 2 |
| Mar 2026 | llms.txt published at root, 18 URLs listed | site-level | 5 | 11 | 3 | 2 |
| Apr 2026 | Organization.sameAs expanded to 5 profiles | site-level | 7 | 12 | 4 | 3 |
| Apr 2026 | Canonical consolidation (3 duplicate clusters merged) | 9 posts removed | 8 | 13 | 4 | 4 |
| May 2026 | Inline primary-source citations density doubled | 12 top posts | 9 | 14 | 5 | 5 |
Reading the table: the largest single-month jump on Perplexity came from publishing llms.txt (March, +2 cited). The largest single-month jump on ChatGPT came from the sameAs expansion plus canonical consolidation (April, +3 cited combined). AIO is the laggiest of the four engines, which matches the citation-decay table — Google's classifier seems to want both top-3 rank and full structural hygiene before it cites, so the lift from any single tactic is smaller. Claude moves the least overall, which is consistent with Anthropic's more conservative web-search citation pattern.
The honest caveat: 25 target queries is a small panel. The deltas above should be read as "directional, consistent with the cited research" rather than "statistically significant." A larger panel would be more credible; we are running it monthly and the table above is what I would defend in a conversation with another founder, not what I would defend in front of a paid SEO research audience.
A second caveat I have to flag: the canonical-consolidation row (April) is partly confounded with the sameAs change because both shipped in the same week. The two effects are not cleanly separable in this data. If you want a clean A/B, ship one structural change per month and wait 30 days before adding the next.
Three things in priority order: structured data (Article + FAQPage + HowTo JSON-LD with at least 4 FAQ items), llms.txt at your site root listing your canonical pages, and entity disambiguation (matching Person and Organization sameAs links across LinkedIn, GitHub, X, Crunchbase). Content quality matters but is necessary, not sufficient. Pages with all three structural signals are roughly 3x more likely to be cited than equivalent pages with content alone, per 2025-2026 Ahrefs and Semrush GEO research.
Yes, but not what most operators think. llms.txt is not a ranking signal the way robots.txt is a crawl directive. It's a curated index of your most LLM-relevant pages, and ChatGPT's and Perplexity's crawlers do read it when present. Adoption sits near 7% of public SaaS sites as of Q1 2026. The cost is roughly 30 minutes to write, the lift is meaningful for sites where your most useful pages are not your most-linked pages, and it costs nothing to leave running.
Two reasons. First, AI engines often strip the Referer header on outbound clicks, so the visit lands as 'Direct/(none)' in any tool that relies on referrer parsing. Second, GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, or claude.ai (and many AI clients open links in a new tab without referrer at all). The result is roughly 100% misattribution of AI-referred sessions in default GA4. Server-side first-party tracking with explicit AI-source detection recovers most of it.
Adding FAQPage schema with at least 4 question-answer pairs that exactly match the visible H2 FAQ block on the page. Ahrefs and Semrush GEO studies through 2025 consistently found AI-cited pages averaged 4 or more FAQ schema items versus 1-2 on uncited pages. The reason is mechanical, FAQ schema gives the LLM training pipeline pre-extracted question-answer pairs that match how users actually phrase queries to ChatGPT and Perplexity.
Partially. You can manually query ChatGPT, Perplexity, and Claude for your target topics weekly and log whether your domain is cited. That captures presence but not traffic. For traffic, you need either server-side analytics that detects AI source headers and known AI client IPs, or a first-party tracker that fingerprints AI-referrer patterns. GA4 alone will not tell you. Most operators learn this the hard way after publishing 6-12 GEO-optimized pages.
For the measurement layer Attrifast actually ships, see first-party UTM-to-revenue tracking and the sibling deep-dive on AI traffic revenue attribution. Attrifast does not do GEO, generate schema, or run a citation tracker — citation logging stays a manual weekly task, as covered above.
Discover which marketing channels bring customers so you can grow your business, fast.
Start free trial →5-day free trial · $29/mo · cancel anytime