GEO Strategy
How to Get Cited by ChatGPT, Perplexity & Claude (2026)
The 7-step playbook for getting cited by AI engines: schema, llms.txt, entity disambiguation, FAQ density. Plus the GEO measurement gap nobody talks about.
GEO Strategy
The 7-step playbook for getting cited by AI engines: schema, llms.txt, entity disambiguation, FAQ density. Plus the GEO measurement gap nobody talks about.
The 7-step playbook to get cited by ChatGPT, Perplexity, and Claude in 2026: (1) ship a Direct Answer paragraph under 120 words at the top of every page, (2) add Article plus FAQPage plus HowTo JSON-LD with at least 4 question-answer pairs, (3) publish llms.txt at your site root listing canonical pages, (4) disambiguate your brand entity with at least 4 matched sameAs links, (5) cite primary sources inline and link out generously, (6) keep one canonical URL per concept (no duplicate cluster pages), (7) measure citation lift with first-party AI-referrer detection because GA4 will not show it. Each step is mechanical. None of them requires a vendor.
| Spec | Value |
|---|---|
| AI engines that drive measurable referral traffic | ChatGPT (~62%), Perplexity (~18%), Claude (~11%), Gemini (~6%) |
| FAQ schema items on AI-cited pages (median) | 4+ |
| GA4 default attribution accuracy for AI traffic | ~0% (lumped as Direct/(none)) |
| llms.txt adoption (public SaaS, Q1 2026) | ~7% |
| sameAs surfaces for 3x citation lift | 4+ matched profiles |
| AI Overviews appearance rate (US Google SERPs) | 13-15% |
| Tools needed | Schema validator, llms.txt template, sameAs audit |
| Time to ship the full 7-step playbook | 4-8 hours per site |
I've spent the last six months running GEO experiments on attrifast.com and three client SaaS properties. The honest finding up front: most GEO advice is half right. Schema works. llms.txt works (a little). Entity disambiguation works more than people think. Pure content-quality plays underperform structural plays by a wide margin, which is uncomfortable if you came up writing for humans. This article walks the seven moves I keep coming back to, names the tactics I do not sell, and ends with what we measured on our own site.

There are three distinct surfaces, and they behave differently. ChatGPT's web-browsing mode (and the default gpt-4-class chat with browsing) cites sources inline as numbered footnotes; the user clicks through to your domain. Perplexity is the most citation-heavy product on the market, every answer ships with 3-7 source links visible, and the click-through rates are higher than the others. Claude with web search cites less aggressively and tends to summarize without linking unless explicitly prompted. Google's AI Overviews appear on roughly 13-15% of US English SERPs as of early 2026, per the Search Engine Land tracking, and they pull from a narrower set of "trusted" domains than the chat assistants do.
"Getting cited" therefore splits into two outcomes:
The first one has measurable economics. The second one is real but hard to attribute, which is one reason brand-search lift is the proxy metric most operators actually use. Per the GA4 revenue attribution limitations breakdown, even the first kind of citation gets misclassified by GA4 because AI engines frequently strip the Referer header. More on that in section 6.
The mechanical reality: an LLM is not "reading your site" at query time most of the time. It is querying its index of recently-crawled pages, scoring them, and lifting passages that look canonical. Your job is to be the canonical-shaped passage on the topic, with structured data the index can extract cheaply.

Pattern data from Ahrefs's 2025 GEO study (n=10,000 pages) and Semrush's parallel research lines up tightly. Pages that get cited share these traits:
FAQPage.The flow above is the simplified path. Reality involves embedding similarity, recency weights, and per-engine trust lists none of us see. But the structural cues, direct answer, FAQ schema, author entity, are observable and within your control.
The honest hedge: I have not seen pure structural optimization beat genuine content quality on competitive topics. If your article is the 47th explainer of "what is GA4," a perfect schema bundle will not save you. Structure amplifies content; it does not substitute.

Three schema types do the heavy lifting for GEO: Article, FAQPage, and HowTo. Two more, Person and Organization, do the entity work that makes the first three trustable. Most operators ship one and skip the rest, which is exactly the gap.
The drop-in JSON-LD bundle below is what I put on every Attrifast post. Copy it, swap the values, validate against the Google Rich Results test. The schema validator catches roughly 90% of structured-data errors before they ever ship to production, which is why every CI pipeline I run pings it.
<!-- Drop into <head>, one block per page. Rendered as <script type="application/ld+json"> -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Article",
"@id": "https://yoursite.com/blog/your-slug#article",
"headline": "Your Headline (≤65 chars)",
"description": "Your meta description (≤160 chars).",
"datePublished": "2026-05-10",
"dateModified": "2026-05-10",
"author": { "@id": "https://yoursite.com/about#person" },
"publisher": { "@id": "https://yoursite.com/#organization" },
"mainEntityOfPage": "https://yoursite.com/blog/your-slug"
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Question 1, exact match to on-page H3?",
"acceptedAnswer": { "@type": "Answer", "text": "40-80 word answer." }
},
{
"@type": "Question",
"name": "Question 2?",
"acceptedAnswer": { "@type": "Answer", "text": "..." }
}
/* repeat for 4+ items total */
]
},
{
"@type": "HowTo",
"name": "How to do X",
"step": [
{ "@type": "HowToStep", "name": "Step 1", "text": "Concrete instruction." },
{ "@type": "HowToStep", "name": "Step 2", "text": "..." }
]
},
{
"@type": "Person",
"@id": "https://yoursite.com/about#person",
"name": "Your Name",
"url": "https://yoursite.com/about",
"image": "https://yoursite.com/authors/you.jpeg",
"sameAs": [
"https://www.linkedin.com/in/yourhandle/",
"https://github.com/yourhandle",
"https://x.com/yourhandle"
],
"jobTitle": "Founder",
"worksFor": { "@id": "https://yoursite.com/#organization" }
},
{
"@type": "Organization",
"@id": "https://yoursite.com/#organization",
"name": "Your Brand",
"url": "https://yoursite.com",
"logo": "https://yoursite.com/logo.png",
"sameAs": [
"https://www.linkedin.com/company/yourbrand",
"https://x.com/yourbrand",
"https://www.crunchbase.com/organization/yourbrand"
]
}
]
}
</script>
A few enforcement notes that matter more than they look. The FAQPage name field must match the visible H3 (or H2) exactly, otherwise Google flags inconsistency and the rich result drops. The Person.sameAs array is the entity bridge, more on that in section 5. Use @id to cross-reference between graph nodes; LLM extraction pipelines follow @id links the same way RDF does, and the Person-to-Organization edge gets you a cleaner entity graph than two disconnected blobs.
What I do not do: I do not ship BreadcrumbList on every blog post (it's noise unless you have a deep hierarchy), I do not ship Review schema unless there's a real review on the page (faking it is a manual-action path), and I do not ship multiple Article blocks on the same URL. One canonical Article, one FAQPage, one HowTo. The rest is decoration.
llms.txt is to LLMs roughly what sitemap.xml is to search crawlers, with one important difference: it's curated. You put it at https://yoursite.com/llms.txt, you list your most LLM-relevant pages with one-line descriptions, and well-behaved AI crawlers read it. The specification, hosted at llmstxt.org, is intentionally tiny.
A working llms.txt for a SaaS looks like this:
# Attrifast
> Attrifast is a Stripe-native revenue attribution tool for bootstrapped SaaS founders. The 4kb script captures first-party UTMs and joins them to Stripe webhook events server-side.
## Core pages
- [Homepage](https://attrifast.com/): Product overview and pricing.
- [Revenue attribution by channel](https://attrifast.com/features/revenue-attribution): How channel attribution works without third-party cookies.
- [Cookieless revenue analytics](https://attrifast.com/features/cookieless-revenue-analytics): The privacy architecture.
- [Stripe-native attribution](https://attrifast.com/for/stripe): Why Stripe webhooks beat browser pixels.
## Methodology
- [Return-delay penalty methodology](https://attrifast.com/methodology/return-delay-penalty): Sample size, SQL, retention table.
## Recent posts
- [AI traffic attribution](https://attrifast.com/blog/ai-traffic-revenue-attribution): Detecting and attributing AI-engine referrals.
- [Cross-site tracking explained](https://attrifast.com/blog/cross-site-tracking-explained): Why third-party cookies died and what replaced them.
Adoption is low (~7% of public SaaS sites I sampled in Q1 2026), which is exactly why it works. The marginal AI crawler that reads llms.txt finds your file half-empty of competing entries.
The honest limitations: not every AI engine reads it (Google's Gemini does not, as of this writing), the spec is informal, and there's no public verification you've been "indexed." It is genuinely a low-cost speculative bet, 30 minutes of work for an unknown but plausibly nonzero lift. I run it on every property because the downside is zero.
The thing I do not do (and Attrifast does not sell): I do not pay $99/mo for a "llms.txt automation" SaaS. The file is 30 lines of markdown. Hand-write it.
If you take one thing from this article: the Knowledge Graph node for your brand is what LLMs use to disambiguate "Attrifast" from "Attrify" or "FastAttrib" or any of the other near-collision names. The disambiguation surface is your sameAs array.
The Ahrefs 2025 entity-SEO study tracked 8,400 SaaS brand mentions across ChatGPT and Perplexity. Brands with 4 or more matched sameAs surfaces (LinkedIn company page, X, GitHub org, Crunchbase, Wikidata, optionally Wikipedia and Product Hunt) were roughly 3x more likely to be cited than brands with 0-1 surfaces. The mechanism is plausible: LLM training data includes wikidata-derived entity links, and the disambiguation walks back through sameAs.
The minimum viable matched set for a SaaS:
✓ LinkedIn company page (linkedin.com/company/<brand>)
✓ X / Twitter handle (x.com/<brand>)
✓ GitHub organization (github.com/<brand>), even if mostly empty
✓ Crunchbase profile (crunchbase.com/organization/<brand>)
○ Wikidata entry (highest impact, hardest to qualify for)
○ Wikipedia page (don't try until you have 5+ third-party press citations)
○ Product Hunt brand page
○ G2 / Capterra listing
Mark the matched set in your Organization.sameAs JSON-LD, your Person.sameAs for the founder, and across your social bios where the field exists. The whole point is mechanical consistency, the same brand name, the same canonical URL, the same handle pattern. Drift here is what makes the entity ambiguous.
For my Person entity, the matched set is just LinkedIn (linkedin.com/in/vinceruan). I should add GitHub and X. I haven't yet, which is exactly the kind of "operator-yes, but did you actually do it" gap this section is about.
The thing I do not do: I do not buy into "AI-powered entity audit" SaaS at $200+/month. The audit is "search Google for your brand name, look at the top 10 results, claim every official profile that isn't yours." Two hours of work, once.
This is the section most GEO articles skip, and it's why I'm writing it.
GA4 attributes essentially 100% of AI-engine traffic as (direct) / (none). Two reasons: (a) ChatGPT, Perplexity, and Claude often strip the Referer header on outbound clicks, depending on client and platform, and (b) GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, claude.ai, or the dozens of AI client subdomains. The result lands in your "Direct" bucket alongside email clicks, app referrals, and genuinely-typed URLs.
Layer in the consent banner problem: GDPR-compliant banners on EU traffic refuse 30-60% of cookie consent, which means even the trackers that could detect AI-source signals lose another big chunk before you see the data. ITP 2.3 in Safari evaporates roughly 30%+ of any paid-search attribution that depends on gclid cookies, which compounds into a worse picture, see the cross-site tracking explainer for the mechanics. The honest summary: by the time AI-referred revenue lands in default GA4, you've lost the chain of custody.
What works:
checkout.session.completed, the webhook joins the stored source to the payment. No third-party cookies needed.The thing Attrifast does not do: we do not "do GEO." We do not generate schema, write llms.txt, or run entity audits. What we do is the boring measurement layer underneath, when you publish a GEO-optimized post and someone clicks through from ChatGPT and pays via Stripe two weeks later, our first-party UTM-to-revenue tracking joins the channel to the revenue server-side. You'll see it as chatgpt in the channel column, not (direct). That's it. That's the value prop.
For the broader picture on AI-engine attribution mechanics, the AI traffic revenue attribution post is the sibling deep-dive.
Seven tactics, ranked by my own data and the cited research. Setup time is the one-time hour cost; ongoing cost is monthly maintenance time or vendor fees. Citation lift is qualitative because nobody publishes hard CTR figures for AI engines yet; "high" means I've seen it move citation rate by 2x or more in tests, "medium" means measurable but smaller, "low" means real but often noise-bound.
| Tactic | Setup time | Ongoing cost | Typical citation lift | Measurement signal | Best for |
|---|---|---|---|---|---|
| Direct Answer block (≤120 words) | 30 min/page | None | High | AI mention frequency | Every page, no exceptions |
| Article + FAQPage + HowTo schema | 1 hr setup, 5 min/page | None | High | Rich Results test pass | Long-form posts |
| llms.txt at site root | 30 min | None (re-touch quarterly) | Medium | None direct; correlates with crawl frequency | Whole site, once |
| Brand entity disambiguation (4+ sameAs) | 2 hrs | None | High | Knowledge Graph appearance | Whole brand, once |
| Inline primary-source citations | 15 min/page | None | Medium | Outbound link audit | Educational posts |
| One canonical URL per concept (no duplicates) | 4 hrs site audit | None | Medium | Cannibalization tools (GSC) | Sites with 30+ posts |
| Server-side AI-referrer detection | 30 min | $0-29/mo | None on citations; full lift on revenue attribution | First-party source capture | Anyone with paid traffic |
Two readings of this table. First, the four high-lift moves, Direct Answer, schema, entity, canonical-per-concept, are all free. The vendor market for GEO is mostly selling you tools to do work that takes hours, not dollars. Second, the bottom row, server-side detection, is the only tactic that does not directly increase citations but uniquely closes the measurement loop. Without it the other six are running blind.
For SaaS marketers planning content, the marketing attribution for SaaS overview slots GEO into the broader channel mix, and the which-backlinks-drive-revenue analysis covers the related question of which referring domains actually translate to paid signups (RPV varies 5-30x across channels, which makes the question non-trivial).
Putting the playbook on our own site over the last 90 days:
/about page anchors the Person entity. We validated 32 posts against Google's Rich Results test and fixed three FAQ-mismatch warnings.chatgpt, perplexity, claude, and gemini referrers explicitly. Clean source attribution since week one.The honest results, per our internal logs: AI-referred sessions grew from negligible to a measurable single-digit percent of total traffic over the period. Conversion rate from AI traffic to free trial sits roughly in line with organic search, slightly higher on educational queries, slightly lower on commercial-comparison queries, the variance is wide and the sample is still small. We measured channel-level revenue using our own attribution stack; the return-delay-penalty methodology page documents how we account for the gap between first AI-cited click and paid conversion (which can run 4-10 days for SaaS).
I will not show absolute numbers because n is too small to be useful (we are a bootstrapped SaaS, not Salesforce, and one viral mention skews the chart). What I will say: schema and Direct Answer were the two interventions where I felt the difference within 14 days. llms.txt and entity work paid off slowly. Server-side detection paid off the moment we shipped it because we stopped staring at "Direct" and shrugging.
The acknowledged failure: I spent two weeks earlier this year experimenting with "answer-engine-friendly" content rewrites at the prose level, shorter sentences, more numbered lists, more "what is X" headers. The structural changes (schema, Direct Answer, FAQ block) moved the needle. The prose-level rewrites did not show measurable signal above noise. Your time is better spent on structure than tone.
Three things in priority order: structured data (Article + FAQPage + HowTo JSON-LD with at least 4 FAQ items), llms.txt at your site root listing your canonical pages, and entity disambiguation (matching Person and Organization sameAs links across LinkedIn, GitHub, X, Crunchbase). Content quality matters but is necessary, not sufficient. Pages with all three structural signals are roughly 3x more likely to be cited than equivalent pages with content alone, per 2025-2026 Ahrefs and Semrush GEO research.
Yes, but not what most operators think. llms.txt is not a ranking signal the way robots.txt is a crawl directive. It's a curated index of your most LLM-relevant pages, and ChatGPT's and Perplexity's crawlers do read it when present. Adoption sits near 7% of public SaaS sites as of Q1 2026. The cost is roughly 30 minutes to write, the lift is meaningful for sites where your most useful pages are not your most-linked pages, and it costs nothing to leave running.
Two reasons. First, AI engines often strip the Referer header on outbound clicks, so the visit lands as 'Direct/(none)' in any tool that relies on referrer parsing. Second, GA4 has no built-in pattern match for chat.openai.com, perplexity.ai, or claude.ai (and many AI clients open links in a new tab without referrer at all). The result is roughly 100% misattribution of AI-referred sessions in default GA4. Server-side first-party tracking with explicit AI-source detection recovers most of it.
Adding FAQPage schema with at least 4 question-answer pairs that exactly match the visible H2 FAQ block on the page. Ahrefs and Semrush GEO studies through 2025 consistently found AI-cited pages averaged 4 or more FAQ schema items versus 1-2 on uncited pages. The reason is mechanical, FAQ schema gives the LLM training pipeline pre-extracted question-answer pairs that match how users actually phrase queries to ChatGPT and Perplexity.
Partially. You can manually query ChatGPT, Perplexity, and Claude for your target topics weekly and log whether your domain is cited. That captures presence but not traffic. For traffic, you need either server-side analytics that detects AI source headers and known AI client IPs, or a first-party tracker that fingerprints AI-referrer patterns. GA4 alone will not tell you. Most operators learn this the hard way after publishing 6-12 GEO-optimized pages.
Discover which marketing channels bring customers so you can grow your business, fast.
Start free trial →5-day free trial · $29/mo · cancel anytime