Blog / GEO Strategy

How to Get Recommended by ChatGPT: A 10-Step Playbook for 2026

32 min readUpdated May 2026

Vincent RuanFounder, Attrifast · May 26, 2026 · 32 min read

A 10-step operator's checklist for getting ChatGPT to recommend your product. Each step has the specific tools, the time investment, the expected impact, and the measurement that proves it worked.

Part of the generative engine optimization guide and AEO Hub.

TL;DR

Getting recommended by ChatGPT is a 10-step procedure, not a vibe. Step 1: audit your current visibility. Step 2: identify 20-30 buyer prompts that matter. Step 3: map those prompts to your existing pages and find the gaps. Step 4: restructure each target page for AI extractability (the highest-impact step). Step 5: earn citations on third-party sources (Reddit, Wikipedia, G2, listicles). Step 6: publish llms.txt and llms-full.txt. Step 7: allow the AI crawlers in robots.txt. Step 8: build entity disambiguation across Wikidata, Crunchbase, LinkedIn. Step 9: monitor weekly with prompt tracking. Step 10: measure traffic and revenue with server-side attribution.
Realistic timeline: first Perplexity citations land in week 3-5, first ChatGPT search citations in week 5-8, unprompted browse-off mentions take one to three quarters because they require the next model retrain. Anybody promising overnight is selling you something.
Step 4 alone accounts for roughly 35-45 percent of the visibility lift. Step 5 is the slow compounder. Steps 6, 7, 8 are cheap insurance. Step 9 keeps you honest. Step 10 is where vanity citations turn into a measurable revenue line.
The whole playbook breaks if you cannot prove it paid. GA4 buckets ChatGPT traffic as Direct, so you need server-side first-party attribution that detects AI sources and joins to Stripe.
Want the 30-prompt weekly scan and the Stripe revenue join done for you? Attrifast tracks ChatGPT, Claude, Gemini, Perplexity weekly and joins to revenue → Start free trial

The 10-step playbook for getting recommended by ChatGPT: audit, prompts, mapping, page restructure, citations, llms.txt, robots, entity, monitoring, revenue

A founder pinged me in February. He had read every "how to rank in ChatGPT" article and still could not get the model to name his product. He wanted a checklist. Not a framework, not a theory of generative engines, a checklist with steps he could tick off on a Friday.

So I wrote him one. Then I ran it on three of my own properties. Then I ran it with two other operators. The version below is what survived. Ten steps, each with a what, a how, a measurement, a time investment, and an honest expected impact.

This is the operator's companion to how to rank in ChatGPT, why ChatGPT might not be recommending your product, and how AI engines actually choose sources. Those answer "what" and "why." This one answers "what do I do on Monday morning."

The 10-step overview

Here is the entire playbook in one table. Read it once before diving in. The "expected lift" column is conservative and assumes you are starting from near-zero ChatGPT visibility. Mature accounts compound faster.

Step	What you do	Time	Cost	Expected lift	When you see it
1	Audit current ChatGPT visibility	3-6 hours	Free or USD 0-99/mo	Diagnostic	Week 1
2	Identify 20-30 buyer prompts	4-8 hours	Free	Diagnostic	Week 1
3	Map prompts to existing pages	2-4 hours	Free	Diagnostic	Week 1-2
4	Restructure target pages for extractability	3 hours per page	Mostly free	35-45% of total	Week 3-8
5	Earn third-party citations	Ongoing	Free to USD 500-2000	25-35% of total	Week 4-12+
6	Publish llms.txt and llms-full.txt	1-2 hours	Free	3-7% of total	Week 2-6
7	Allow AI crawlers in robots.txt	15 minutes	Free	Unblocker, not a lift	Week 1
8	Build entity disambiguation	6-12 hours	Free	8-15% of total	Week 4-16+
9	Weekly prompt monitoring	1-2 hours/week	Free to USD 99/mo	Keeps it honest	Continuous
10	Revenue measurement (cited-clicked-paid)	1 hour setup	USD 15/mo and up	Closes the loop	Continuous

Two notes on this table before we dive in. First, "free" assumes you do the work yourself. If you outsource step 4 page restructures to a contractor at USD 50-150 per hour, the numbers move. Second, the "expected lift" percentages add up past 100 because they overlap. Fixing your robots.txt unlocks everything else, restructuring a page also helps it earn third-party links, and so on. The math is multiplicative, not additive.

Why this works in 2026 (the short version)

This procedure works because it touches both ChatGPT citation pathways. ChatGPT recommends sources two ways: from its frozen training corpus (browse-off mode) and from live retrieval at query time (search and browse). Each pathway has its own scoring criteria and its own clock. Most "how to rank in ChatGPT" advice optimizes one pathway and ignores the other.

Steps 1, 2, 3, 9, and 10 cover both pathways because they are about diagnosis and measurement. Step 4 leans heavily into live retrieval but also helps training-corpus ingestion. Step 5 is the dominant lever for training-corpus presence. Steps 6 and 7 are mostly retrieval-side. Step 8 is the linchpin for both, because if the model cannot confidently identify your brand it will recommend someone whose entity it understands.

The Princeton GEO paper [1] found that adding citations, statistics, and quotations to a page lifted generative-engine visibility by up to 40 percent, while keyword stuffing did almost nothing. That is the empirical anchor for step 4. Operator measurement across roughly 40 properties, plus research from Ahrefs [2], Semrush [3], Backlinko [4], Profound [5], and Peec [6], supports the relative weights.

ChatGPT had roughly 400 million weekly active users in late 2025 [7]. Roughly 13-15 percent of US English Google SERPs now show AI Overviews [8]. Whatever fraction of your buyer journey starts in an AI engine today will be larger next quarter.

Step 1: Audit your current ChatGPT visibility

What you do. Before you change a single page, you measure where you currently stand. Run 20-30 prompts across the four major AI engines (ChatGPT, Claude, Gemini, Perplexity) and log whether your domain appears, in what position, and with what surrounding context. This is the only step in the entire playbook you can do with zero preparation, and it is the diagnostic the next nine steps depend on.

How you do it. Pick the prompts (we will refine them in step 2; for now just brainstorm 20). Open each engine in a clean browser session with no logged-in account and no personalization signals if you can avoid it. Run each prompt and record three things: did your domain appear as a cited source, did your brand name appear in the answer text (with or without a link), and which competitors appeared. Drop the results into a spreadsheet with columns for engine, prompt, citation rank, brand mention, and notes.

If you want the same scan automated, our AI visibility score runs the prompts weekly across ChatGPT, Claude, Gemini, and Perplexity, but the manual version costs nothing except an afternoon and teaches you more about the models' behavior than any tool readout. Profound [5], Peec [6], Otterly, and SEOcrawl all do similar prompt scans at various price points.

How you measure success. Step 1 is purely diagnostic; the "success" is having a clean baseline before you change anything. The metric is "share of voice": across your 20 prompts, what percentage of the answers cited your domain at all, and what percentage mentioned your brand by name. Most properties I have audited start in the 0-15 percent range. The brutal honesty of that number is the point.

Time and cost. Three to six hours for the manual version. Free if you do it yourself; USD 0-99 per month for an automated tool depending on prompt count.

Expected impact. None directly. Step 1 produces no lift, only knowledge. But you cannot run any of steps 2-10 well without it, so the indirect impact is everything.

Here is the spreadsheet shape I have used on every audit I have run, cleaned up for sharing. Steal it.

Column	What goes in it	Example
engine	ChatGPT / Perplexity / Claude / Gemini	ChatGPT
date	When you ran the prompt	2026-05-26
prompt	Exact text	"best Stripe attribution tool for SaaS"
your domain cited?	Y / N	N
citation rank	1-5 or null	null
brand mention in answer?	Y / N	N
competitors named	List	Stripe Sigma, ChartMogul, Fathom
surrounding context	One-line note	"answer focused on enterprise BI"
pathway	retrieval / corpus / both	retrieval

One more diagnostic move. In ChatGPT specifically, toggle browsing on for half your prompts and off for the other half. The split tells you whether your visibility problem is corpus-side (model never learned about you), retrieval-side (model could find you but does not), or both. This single observation routes which of the later steps will matter most. The how AI engines choose sources deep dive walks the two pathways in detail; bookmark it before step 4.

Audit finding	Likely root cause	Which steps matter most
Cited with browsing on, never cited with browsing off	Corpus absence	Steps 5, 7, 8
Never cited either mode	Both pathways failing	All 10
Cited with browsing off, not in search	Strong corpus, weak retrieval	Steps 4, 6, 7
Cited but always last in citation list	Re-ranker deprioritizing you	Step 4
Brand mentioned, no link	Corpus recognition, no retrieval	Step 4, 6

Step 1 took me four hours on my own site. The result was uncomfortable: cited on three of twenty-eight prompts, all on Perplexity, none on ChatGPT. That number is the reason this article exists.

Step 2: Identify the 20-30 buyer prompts you actually need to win

What you do. Replace your audit's brainstormed prompts with the actual prompts buyers in your category type. The difference is significant. Brainstormed prompts read like keywords ("Stripe attribution"). Real buyer prompts read like full sentences with intent ("how do I see which marketing channel my Stripe revenue came from"). The second shape is what ChatGPT actually receives, and your visibility on the second shape is what drives revenue.

How you do it. Combine four sources. First, your sales transcripts (Gong, Fathom, or whatever you use) searched for the questions prospects ask. Second, your support tickets searched for "how do I" and "what is." Third, the People Also Ask box for your top-converting Google keywords. Fourth, the autocomplete suggestions in ChatGPT and Perplexity when you start typing your category language. From each source, pull 5-10 phrasings and consolidate to a shortlist of 20-30.

How you measure success. Two checks. One, every prompt in your list should have a clear intent (informational, comparison, transactional). Two, every prompt should be specific enough that a generic AI answer cannot evade it. "Best CRM" is too broad; "best CRM for solo founders selling productized services" is the right shape. If your list has too many broad prompts, you are competing with Salesforce and HubSpot for slots you cannot win; if you have too many narrow ones, the search volume is too low to matter.

Time and cost. Four to eight hours of focused work. Free.

Expected impact. Indirectly enormous. Every later step in the playbook optimizes against this prompt list. Picking the wrong 20 prompts is the single highest-cost mistake in the entire procedure, because all of steps 3-10 are then aimed at the wrong target. I have re-done step 2 on three different occasions because the first cut was too generic.

Here is the breakdown I use for prompt-type balance.

Prompt type	% of list	Example
High-intent buyer ("best for X")	25-30%	"best Stripe revenue attribution tool for SaaS founders"
Comparison ("A vs B")	15-20%	"Attrifast vs ChartMogul for attribution"
Problem-language ("my X is doing Y")	25-30%	"my ChatGPT traffic shows as Direct in GA4, how do I fix it"
Category exploration	15-20%	"what tools track ChatGPT referral revenue"
Long-tail integration	10-15%	"Stripe webhook attribution with first-party cookies"

The problem-language bucket is the one most teams skip and the one ChatGPT rewards most. Buyers describe symptoms before they know the category name. If your page answers "my X is doing Y" in plain language, you win citations that the keyword-shaped pages never see. Our prompt tracking feature lets you save these to a watch list and monitor them weekly, but the prompt-discovery work itself is unautomatable thinking work. Do it yourself.

A diagnostic move worth running here: paste your 20-30 prompts into ChatGPT one at a time and read the answer twice. First, look for who is cited. Second, look at the answer text for the language patterns the model uses ("the leading tool," "best known for," "commonly recommended") because those are the slots you are competing for, and the language ChatGPT defaults to tells you what kind of page wins each slot.

Step 3: Map prompts to existing pages (the gap analysis)

What you do. For each of your 20-30 prompts, identify which page on your site (if any) is the best current answer. Then mark each prompt as one of five states: already covered well, covered but needs restructure, partially covered across multiple pages, not covered at all, or covered by a competitor's page you cannot displace. The output is a prioritization list for step 4.

How you do it. Open your spreadsheet from step 1, add a column for "current best page" and a column for "gap state." Walk through each prompt and find the single page on your site that most directly answers it. If two or three pages partially address the prompt, mark it as fragmented. That is a different problem than a flat absence. If no page addresses it, you have a content gap that needs a new page in step 4 (or a strategic decision to skip).

How you measure success. A clean priority list with rough effort estimates. The breakdown I aim for on a 30-prompt audit: roughly 8-12 prompts in the "covered, needs restructure" bucket (step 4 work), 5-8 in the "not covered" bucket (new page work), 3-5 fragmented (consolidation work), the rest either fine or skip.

Time and cost. Two to four hours. Free.

Expected impact. Indirect again, but step 3 is what makes step 4 efficient instead of scattershot. Without the gap map, you spend step 4 restructuring pages that already win and ignoring pages that are losing winnable prompts. I see teams burn entire quarters on this mistake.

Gap state	What it means	Step 4 action
Already covered well	Cited or close to it	Leave alone; do not over-edit
Covered, needs restructure	Right page, wrong shape	Full restructure (step 4)
Fragmented across pages	2+ pages partially answer	Consolidate into one canonical page
Not covered	No page exists	Write a new page from the prompt
Competitor-dominated	Their page is canonical	Either out-structure them or skip

The competitor-dominated row is the painful one. Sometimes the right answer is to skip a prompt because the canonical Reddit thread, the Wikipedia article, or the well-built competitor page has too much corpus presence to displace within a quarter. Pick your fights. A prompt that ChatGPT consistently answers with "Stripe's own docs" is not a prompt you win in three months.

For each "not covered" prompt, draft a one-line page brief: what the page needs to answer in its first 100 words, what supporting evidence it needs (statistics, citations, comparison tables), and which existing page (if any) it can be split off from. This is the bridge into step 4. Do not write the pages yet; just have the briefs ready.

Step 4: Restructure each target page for AI extractability

What you do. Take each page from the "needs restructure" and "not covered" buckets and rebuild it to a specific AI-extractable shape. This is the single highest-impact intervention in the playbook. The Princeton GEO research [1], my own measurement, and every operator I trust support the lift. Spend 60 percent of your effort here.

How you do it. Six required elements. One: a direct answer in the first 100-120 words. Two: question-shaped H2s that match prompt language ("How does X work?" rather than "X Overview"). Three: at least one comparison table with honest specifics. Four: 4-8 inline citations to primary sources. Five: FAQPage and Article JSON-LD with questions matching your visible H2s exactly. Six: a visible "updated" date and meaningful body refresh quarterly.

Each element does a specific job in the live-retrieval pipeline. The direct answer gives the re-ranker a clean passage. Question-shaped H2s match the query rewrite. Comparison tables parse as structured passages. Inline citations satisfy source-diversity trust. JSON-LD makes parsing trivial. Fresh dates trigger re-crawl prioritization.

How you measure success. Three checks. One, the page should be cited within 4-8 weeks on its target prompts (Perplexity first, then ChatGPT search, then browse-off). Two, your average citation rank improves. Three, when ChatGPT cites you, the answer text pulls from your direct-answer block. If it paraphrases an off-page source, your extraction is failing even when your citation succeeds.

Time and cost. Roughly 3 hours per page once you have the briefs from step 3. Across 10-15 pages, budget 30-45 hours. Outsourced at USD 50-150/hour, that is USD 1,500-6,750. Do the first two or three yourself to internalize the pattern, then parallelize.

Expected impact. 35-45 percent of total visibility lift across the playbook, by my own measurement. This is the load-bearing step.

Here is the before-vs-after page structure I use, side by side, so you can audit your own pages quickly.

A note on the direct-answer block, which is the most common failure point. The block needs to be a self-contained 60-120 words that answers the prompt without requiring context from elsewhere on the page. If you cut it and paste it into Slack, a smart colleague should be able to act on it. That standard is the same one the re-ranker uses. The AI search optimization checklist has more direct-answer examples.

Step 4 took me about 3 hours per page across my first eight pages. The impact showed up in week 5 on Perplexity, week 7 on ChatGPT search, and the unprompted browse-off mentions did not arrive until late month three. That lag is consistent across every operator I have compared notes with.

Step 5: Earn citations on third-party sources

What you do. Build a presence in the third-party sources that ChatGPT learned from and continues to learn from. Reddit, Wikipedia, G2, Capterra, Hacker News, industry newsletters, Stack Overflow (for technical brands), and category-specific listicles. The goal is brand mentions and citations in places that already carry corpus weight, not paid links.

How you do it. Five sub-tracks, in order of impact for most categories:

Reddit. Identify the 5-10 subreddits where your category buyers hang out. Build a posting history of genuinely helpful answers (not promotional). When relevant, name your product. Do not spam; the moderators are sharp. Reddit's content is licensed to Google for AI training reportedly around USD 60M per year [9], which means a genuine helpful mention compounds for a long time.
Wikipedia and Wikidata. Wikipedia has notability rules that block most SMB SaaS, but Wikidata is permissive. Create a Wikidata entity for your company with founder, founding date, headquarters, category, and official site properties. Then ensure any Wikipedia article in your category (if one exists) has an accurate, neutral mention of your product with a primary-source citation.
G2 and Capterra. Claim your profiles, fill them completely, request reviews from happy customers, and respond to every review professionally. AI engines lean on these for B2B SaaS comparison queries because the review platforms are heavily crawled.
Listicles and category roundups. For each of your top 10 prompts, identify the 5-15 listicles that already rank. Reach out to the authors with a genuinely useful pitch (data, a specific perspective, a comparison angle they missed). Aim for 2-5 placements per quarter.
Hacker News, podcasts, and newsletters. Lower-volume but high-trust sources. A single thoughtful HN comment thread or podcast appearance can move the corpus needle disproportionately.

How you measure success. Track three things. One, total third-party mentions per month (use a Google Alert, Mention.com, or a manual quarterly review). Two, mentions on the specific listicles you targeted in step 3. Three, your ChatGPT citation rate on prompts that historically pulled from Reddit or G2 (those should move first as the AI engines re-crawl).

Time and cost. Ongoing. Plan for 3-6 hours per week of consistent effort. Cost: free if you do it yourself, USD 500-2,000 per month if you hire a community manager or outreach specialist. Most teams under-invest here because the work is slow and unsexy. Most teams who out-invest here win the category over 12-24 months.

Expected impact. 25-35 percent of total visibility lift. The compounding is real but multi-quarter, which is why step 4 (faster) gets the higher weight. Step 5 is the moat.

Here is the source matrix I use to plan the third-party citation work. The "AI weight" column is my own subjective estimate based on observation, not a published number.

Source	Effort to earn	Time to land	AI weight (corpus)	AI weight (retrieval)	Notes
Wikidata entry	Low (2-4 hrs)	Days	High	Medium	Free, controllable, must be accurate
Reddit (authentic posts)	High (ongoing)	Weeks-months	Very high	High	Spam risk; do not promote
Wikipedia article (yours)	Very high	Months-years	Very high	Medium	Notability gate; most SMBs cannot
Wikipedia mention (in others')	Medium	Weeks-months	High	Low	Edit accurately, cite primary
G2 / Capterra profile	Low (4-8 hrs)	Days	Medium	High	B2B SaaS only
Listicle placement	Medium-high	Weeks	Medium	High	Outreach + offer
HN thread (organic)	Variable	Days	Medium-high	Low	Cannot be forced
Crunchbase profile	Low (1 hr)	Days	Medium	Medium	Entity disambiguation
LinkedIn Company page	Low (2 hrs)	Days	Low-medium	Low	sameAs anchor
Industry newsletter mention	Medium	Weeks	Medium	Low	Pitch genuine angle
GitHub org (technical brands)	Low	Days	Medium	Low	sameAs anchor
Podcast appearance	Medium-high	Months	Medium-high	Low	Transcripts get crawled
Stack Overflow (developer brands)	Medium	Weeks-months	Medium	High	Answer real questions

A specific Reddit warning. Reddit's spam filters are aggressive and a banned account is worse than no account. Build karma in your home subreddits for two to three weeks before mentioning your product, never lead with promotion, and answer questions from people genuinely deciding between tools in your category. A balanced post that names two competitors honestly alongside your product drives corpus weight for months; a new account posting "check out my tool" gets removed in minutes.

The Reddit and Wikipedia thesis is so important I wrote a dedicated piece on the r/SaaS and r/SEO mention compounding effect for the deep version. For the playbook, just know that step 5 is the slow but durable moat, and ignoring it caps your ceiling on every other step.

Don't have time to monitor 30 prompts every week?

Attrifast scans ChatGPT, Claude, Gemini, and Perplexity weekly for your prompt list, tracks your share of voice, and joins citations to Stripe revenue. Free 7-day trial.

Start free trial →

7-day free trial · $15/mo · cancel anytime

Step 6: Publish llms.txt and llms-full.txt

What you do. Add two files at the root of your domain: /llms.txt and /llms-full.txt. The first is a curated map of your most LLM-relevant pages, the second is a fuller dump of canonical content. The llmstxt.org [10] standard is the spec; adoption is still low (~7-10 percent of public SaaS sites in Q1 2026), which means the marginal AI crawler that reads it finds an uncrowded landscape.

How you do it. Write llms.txt as a markdown file structured as: site name, one-line description, a "## Docs" section linking to your top documentation pages, a "## Blog" section linking to your top SEO content, and an optional "## Other" section. Keep it under 5,000 words; the spec rewards concision. llms-full.txt is a longer concatenation of your canonical content in markdown, intended for full ingestion. Both files should be served at HTTP 200 with Content-Type: text/plain or text/markdown.

How you measure success. Three checks. One, the files are served at HTTP 200 (verify with curl). Two, your server logs show requests from AI user-agents to /llms.txt within 7-14 days of publishing. Three, you have a mental floor that this is a small lift and you should not over-invest. If you spend more than 4 hours total on this step you are doing it wrong.

Time and cost. 1-2 hours. Free.

Expected impact. 3-7 percent of total visibility lift, by my estimate. Small but cheap. The honest version: ship it because the downside is zero and the upside compounds with future spec adoption, not because it is a silver bullet.

Here is the minimum-viable llms.txt template I use, with placeholders. Save 20 minutes by adapting this directly.

Section	Content	Notes
Title	`# Your Brand Name`	H1 only
Tagline	One-line description of what you do	15-20 words
Summary	2-4 sentences of context	Plain prose
`## Docs`	Bullet list of doc URLs with one-line descriptions	Top 10-20 docs
`## Blog`	Bullet list of canonical blog URLs	Top 10-20 posts
`## Other`	Pricing, about, contact, status	Optional
File size	Under 5,000 words total	Concision wins

For llms-full.txt, the pattern is to concatenate the full markdown of every page in your "## Docs" and "## Blog" sections, separated by clear headers. This file can be 50,000+ words. Some teams generate it dynamically from their CMS; for most static sites a build-step script suffices.

The llms.txt revenue impact post walks the measurement nuances if you want to verify a lift in your specific case. For 90 percent of sites, ship the file, log the crawler hits, move on. Do not let llms.txt become a weeklong project.

Step 7: Open up to AI crawlers (robots.txt)

What you do. Audit your robots.txt and ensure GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and OAI-SearchBot are explicitly allowed. Many sites blocked these in the 2023-2024 panic over content rights, and many still have the block in place silently costing them training-corpus presence. This is a 15-minute audit that often unlocks a quarter of work.

How you do it. Open your robots.txt. Look for any User-agent: GPTBot directive followed by Disallow: /. Remove it or replace with Allow: /. Do the same for ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User. If you genuinely need to block, do so selectively (e.g., block crawling of /admin/ paths) rather than blanket blocking the entire user-agent.

How you measure success. Two checks. One, fetch https://yourdomain.com/robots.txt and confirm the crawlers are allowed. Two, in your server logs, look for incoming requests with user-agents matching GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, and ChatGPT-User. If you see no crawl activity at all after 14 days, your block may still be active or your DNS is doing something funny.

Time and cost. 15-30 minutes. Free.

Expected impact. This is an unblocker, not a lift. If you were blocking, removing the block makes every other step possible. If you were not blocking, this step costs you nothing and changes nothing.

AI crawler	User-agent string	Owned by	Purpose	Allow?
GPTBot	`GPTBot`	OpenAI [11]	Training corpus crawler	Yes (default)
ChatGPT-User	`ChatGPT-User`	OpenAI [11]	Live fetch when user clicks a citation	Yes
OAI-SearchBot	`OAI-SearchBot`	OpenAI [11]	ChatGPT search index	Yes
ClaudeBot	`ClaudeBot`	Anthropic [12]	Anthropic web crawl	Yes
Claude-Web	`Claude-Web`	Anthropic [12]	Live fetch	Yes
PerplexityBot	`PerplexityBot`	Perplexity	Perplexity search index	Yes
Google-Extended	`Google-Extended`	Google [13]	AI training (gates Bard/Gemini training)	Yes
Applebot-Extended	`Applebot-Extended`	Apple	Apple Intelligence training	Yes if relevant
Bingbot	`Bingbot`	Microsoft	Bing + Copilot index	Yes
FacebookBot	`FacebookBot`	Meta	Llama training	Optional

Here is the actual robots.txt block I run on my own site, scrubbed for sharing. Lift it directly if you want.

Directive	Value
User-agent	*
Allow	/
User-agent	GPTBot
Allow	/
User-agent	OAI-SearchBot
Allow	/
User-agent	ChatGPT-User
Allow	/
User-agent	ClaudeBot
Allow	/
User-agent	PerplexityBot
Allow	/
User-agent	Google-Extended
Allow	/
Sitemap	https://yourdomain.com/sitemap.xml

One nuance worth naming. Blocking GPTBot removes you from future training corpora but does not remove you from current ones; your past content already lives in the training data and will continue to surface in browse-off ChatGPT answers until the next retrain ages out. The block only impacts the next training pass. Most teams who blocked in 2023 are now un-blocking, because two years later the costs (no recommendations) are visible and the benefits (some content-rights upside) never materialized for the average site.

For technical-brand sites, the AI crawler tracking guide walks the server-log side of this in detail, including how to verify the bots you are seeing are the real bots (the impersonators are real and increasingly clever).

Step 8: Build entity disambiguation

What you do. Make sure that when ChatGPT encounters your brand name, the surrounding context across multiple authoritative sources agrees on what you do, who founded you, when, and where. The mechanism is called entity disambiguation, and the failure mode is that the model is uncertain enough about your brand identity that it omits you from recommendations rather than risk a confusing answer.

How you do it. Six profiles, all consistent:

Wikidata entity. Create a Q-item for your company. Include: instance of (business), industry, founder, founding date, headquarters location, official website, social media handles. Wikidata is the most-quoted entity graph by every major LLM.
Crunchbase profile. Complete with founders, funding (if any), category, description, official URLs.
LinkedIn Company page. Complete with about, industry, headquarters, size, founded year.
Your homepage Organization JSON-LD. Include a sameAs array pointing to every other profile above (Wikidata, Crunchbase, LinkedIn, X/Twitter, GitHub, G2, Capterra).
Google Business profile (if you have a physical address). Or skip if purely online.
GitHub Organization (if you ship code). Public repos add corpus weight.

How you measure success. The diagnostic is a no-browse ChatGPT prompt: "Tell me about [Your Brand Name]." If ChatGPT hallucinates, omits, or confuses you with a similarly named entity, your disambiguation is broken. If it returns an accurate paragraph that names your category, founder, and rough founding context, your entity work is landing. A second check: paste your brand name into Google and look at the Knowledge Panel on the right. If Google shows one, ChatGPT's entity graph is also healthy.

Time and cost. 6-12 hours total across the six profiles. Free (some platforms charge for verification badges, optional).

Expected impact. 8-15 percent of total visibility lift. The impact compounds with step 5 (third-party citations), because every citation source verifies your entity against the sameAs graph you built.

Here is the property checklist for your Wikidata entity. Fill all of these; partial entries do less work.

Wikidata property	Value	Required?
`instance of` (P31)	business / company / SaaS application	Yes
`industry` (P452)	Your specific industry	Yes
`founded by` (P112)	Founder name(s) with Q-items if they exist	Yes
`inception` (P571)	Founding year (or year-month-day)	Yes
`headquarters location` (P159)	City	Yes if applicable
`official website` (P856)	Your canonical URL	Yes
`Twitter username` (P2002)	Your handle	Recommended
`GitHub username` (P2037)	Your org	If applicable
`LinkedIn company ID` (P4264)	Your LinkedIn URL slug	Recommended
`Crunchbase organization ID` (P2347)	Your Crunchbase slug	Recommended
`described at URL` (P973)	Your About page	Recommended
`subsidiary of` / `owner of`	If you have a parent	If applicable

The Wikidata documentation [14] walks the editor flow if you are new. Plan for an hour for the initial entity creation, and revisit quarterly to add properties as your company grows.

Your homepage Organization JSON-LD should look approximately like this shape (paraphrased to avoid MDX issues; see the full spec at schema.org/Organization):

JSON-LD property	Value
`@type`	Organization
`name`	Your brand name
`url`	Your canonical homepage
`logo`	Absolute URL to logo PNG
`founder`	Person object with name and sameAs to LinkedIn
`foundingDate`	YYYY-MM-DD
`sameAs`	Array: Wikidata, Crunchbase, LinkedIn, Twitter/X, GitHub, G2, Capterra, Wikipedia (if applicable)

The sameAs array is the single most important field. It is what tells every crawler that all those profiles refer to the same entity. Without it, each profile sits in isolation and the model has to guess at the connection.

Step 9: Monitor weekly with prompt tracking

What you do. Re-run your 20-30 prompts from step 2 once a week, log the deltas, and notice which prompts are gaining and losing citations. This is the feedback loop that tells you whether the rest of the playbook is working, and which step needs revisiting next month.

How you do it. Two options. Manual: every Monday morning, paste each prompt into ChatGPT, Perplexity, Claude, and Gemini, log the result in your spreadsheet. Tedious but free, and you learn a lot from reading the answers carefully. Automated: use a prompt tracking tool that runs the scan headlessly and emails you a diff. Our prompt tracking feature, Profound [5], Peec [6], and Otterly all do this.

How you measure success. Three metrics per week:

Citation count. How many of your 20-30 prompts cited your domain across all four engines combined.
Share of voice. Of all the sources cited across your prompts, what percentage were yours (versus competitors and third-party sources).
Brand mention count. How often your brand name appeared in answer text without a link (corpus-pathway signal).

Trend the three over 8-12 weeks. The Princeton GEO research [1] and operator measurement suggest that if you executed steps 4-8 cleanly, all three metrics should rise monotonically (with weekly noise) over that window. If they are flat, you missed something in steps 4 or 5.

Time and cost. 1-2 hours per week manual; USD 99-499 per month for an automated tool depending on prompt count and engines covered. The Attrifast version is included in the standard USD 15/month plan.

Expected impact. Step 9 is operational, not strategic. It does not directly add visibility; it tells you where to invest more effort. But without it, you have no idea whether the playbook is working until you check revenue six months later. That gap is too long to defend.

Here is the weekly tracking template I run on my own properties. Copy it.

Week	Total citations (sum across engines)	Share of voice (%)	Brand mentions (no link)	Top-moving prompt	Top-losing prompt	Action
W1 (baseline)	4 / 30	8%	2	n/a	n/a	Note baseline
W4	7 / 30	14%	4	"stripe attribution"	"AI traffic GA4"	Restructure GA4 page
W8	13 / 30	22%	8	"ChatGPT revenue"	"perplexity ROI"	Earn 3 more listicle mentions
W12	18 / 30	28%	13	"AI visibility tool"	none	Maintain; explore new prompts

The "top-losing prompt" column is the one most teams skip and the one I find most valuable. Losing a citation tells you something specific: either a competitor restructured their page, the engine updated its retrieval ranking, or your page went stale. Each of those routes to a different intervention.

Step 10: Measure traffic and conversion (the revenue close)

What you do. Install server-side first-party attribution that detects AI-engine sources by referrer, user-agent, and behavioral pattern, then joins the resulting session to your Stripe payment via webhook. Without this step, every prior step is a vanity exercise. With it, you can answer the only question that pays the bills: which AI engine recommendation actually drove paying customers, and at what revenue per visitor.

How you do it. ChatGPT, Claude, and Perplexity all strip the referer header inconsistently, and GA4 buckets the resulting visits as Direct or (none) [15]. The fix has three layers:

Server-side referer capture. When a visitor lands on your site, capture the Referer header server-side (before any JavaScript runs) and store it on the session. AI clients that leak even a partial referer are detected here.
AI source detection. Maintain a lookup of known AI user-agents (ChatGPT-User, ClaudeBot, etc.) and behavioral patterns (no JavaScript, no cookies, fetch-only). Flag visits that match.
Stripe webhook join. When a Stripe payment_intent.succeeded or checkout.session.completed event fires, join the customer email or customer ID back to the original session by stored attribution key, and record AI source as the first-touch (or last-touch, depending on your model).

The full stack lives at /track-chatgpt-traffic and the AI citation tracking feature, and we have written the implementation deep dive at ChatGPT referral analytics guide. You can build this yourself in 1-2 weeks if you have a Node or Python backend; Attrifast ships the same stack in 4 minutes of installation for USD 15/month.

How you measure success. The output you want is a table that looks like this, refreshed weekly.

Source	Sessions (last 30d)	Paid trials	Paid customers	Revenue per visitor (USD)	Notes
Google organic	12,400	142	38	1.13	Baseline
ChatGPT (search + chat)	940	22	8	2.07	High intent
Perplexity	380	9	3	1.83	Trackable, fastest mover
Claude (web search)	110	2	0	0.41	Brand asset
Google AI Overviews	220	6	2	1.65	Growing share
Reddit (organic referral)	180	5	2	2.21	Step 5 paying off
GA4 "Direct"	4,800	31	9	0.62	Mostly AI leakage

That last row is the killer. Roughly half of the "Direct" traffic in untreated GA4 is AI-source leakage. Attrifast separates it. Without separation, your dashboard shows AI traffic as Direct, your CFO sees no AI line item, and your GEO work gets killed at the next board meeting despite being your highest-RPV channel.

Time and cost. 1 hour setup. USD 15/month for the Attrifast version. Free if you build it yourself (1-2 engineer weeks).

Expected impact. Step 10 does not add visibility; steps 1-9 do. Step 10 is what makes visibility defensible. The honest version: without step 10 your playbook is a marketing project. With step 10 your playbook is a measurable business line.

Step 10 is where we live

Measure which ChatGPT recommendations actually drove traffic and conversion. Server-side capture, AI source detection, Stripe webhook join. 4 minutes to install.

Start free trial →

7-day free trial · $15/mo · cancel anytime

What's actually realistic timeline-wise

Here is the honest version of the timeline, because the "ChatGPT recommendations in 7 days" tweets are nonsense and the people writing them have never measured a thing. The expected sequence across the roughly 40 properties I have instrumented and the operators I have compared notes with looks like this.

Three reality checks. One: weekly monitoring will show noise. A prompt you were cited on in week 4 may disappear in week 6 and reappear in week 9. The signal is the trend. If weekly volatility drives emotion, you will quit before the compounding lands.

Two: the gap between Perplexity (fastest) and ChatGPT browse-off (slowest) is real. Perplexity moves in week 3-5, ChatGPT search in week 5-8, browse-off mentions do not move at all until OpenAI ships a new model that includes your earned third-party citations. That gap can be 6-12 months and you cannot speed it up.

Three: the playbook compounds. Year-one results are real but moderate. Year-two results, when your Reddit history, listicle placements, Wikidata entity, and restructured pages have all aged into the corpus, are where the channel becomes meaningful. The teams who win this treat it as a 24-month investment, not a quarterly campaign.

Time point	Realistic state	Optimistic state	Pessimistic state
Week 4	1-3 Perplexity citations	4-6 citations	0 movement
Week 8	4-8 citations across engines	10+	1-2
Week 12	8-15 citations, ChatGPT search moving	20+	4-6
Month 6	20-30+ citations, browse-off starting	40+	10-15
Year 1	Defensible channel, RPV trackable	Material revenue line	Small but real

The pessimistic column is what happens when step 4 was rushed, step 5 was skipped, or the category is dominated by Reddit-and-Wikipedia incumbents. The optimistic column is mostly an empty category. Most teams executing the playbook honestly land in the realistic column.

Common failure modes (and how to spot them)

A few patterns I see often enough to name explicitly, because they are easier to avoid than to fix once you have wasted a quarter on them.

Failure mode 1: skipping step 1. Teams dive into restructuring pages with no baseline. Three months later they cannot tell whether visibility moved. The fix is mechanical: spend the four hours up front.

Failure mode 2: optimizing for prompts nobody types. Teams pick prompts that sound like SEO keywords ("attribution software") instead of buyer language ("how do I see which channel drove this Stripe payment"). The model receives the second shape, not the first. Re-run step 2 from sales transcripts.

Failure mode 3: under-investing in step 5. Page restructure is gratifying because you see results the same day. Third-party citations are slow. Teams default to step 4 indefinitely and never build the moat. Set a quarterly target of 5-10 new placements and protect it on the calendar.

Failure mode 4: corpus latency frustration. New companies blame the playbook when browse-off ChatGPT does not recommend them. The fix is patience plus measurement of the surfaces you can move (Perplexity, ChatGPT search). The browse-off surface follows the model retrain cycle, not your roadmap.

Failure mode 5: measuring presence instead of revenue. Teams get cited, fail to instrument step 10, and three quarters later cannot prove the work paid. The CFO asks the obvious question and there is no answer. Do step 10 in week one.

Failure mode 6: chasing every new engine. Teams scatter effort across seven engines and win none. The fix is concentration: ChatGPT, Perplexity, Claude, Gemini, Google AI Overviews, in that order. Master the first three before adding anything else.

Failure mode	Time wasted	Severity	Fix
Skipping audit	1 quarter	High	Spend 4 hours up front
Wrong prompts	1-2 quarters	Critical	Re-run step 2 from sales transcripts
Under-investing in step 5	Permanent ceiling	High	Quarterly placement targets
Frustration with corpus latency	1 quarter quit	Medium	Measure surfaces you can move
Presence without revenue	Whole project killed	Critical	Step 10 in week 1
Chasing every engine	Ongoing dilution	Medium	Top 3-5 engines only

The Hacker News thread on this topic from late 2025 [16] is worth reading for the operator commentary, especially the section on Reddit moderation, which catches more brands than any other failure mode. The r/SEO subreddit recurring thread on "ChatGPT recommendations" [17] is the live pulse of what is actually working at the small-team level; bookmark it.

What the data says (and where it gets uncertain)

I want to be specific about what is well-supported versus what I am extrapolating, because the GEO space is full of confident claims that fall apart under questioning. Below is the evidence map for the major claims in this playbook.

Claim	Evidence strength	Source
Citations, statistics, quotations lift visibility ~30-40%	Strong	Princeton GEO paper [1]
FAQPage schema correlates with AI citations	Moderate-strong	Ahrefs [2], Semrush [3]
Reddit and Wikipedia are over-represented in citations	Strong	Reuters licensing [9], Common Crawl analysis [18]
ChatGPT cites 3-5 sources per search answer	Documented	OpenAI [19]
GA4 buckets AI traffic as Direct	Documented	Google Analytics docs [15]
Perplexity is the easiest engine to win citations on	Inferred (operator data)	Perplexity FAQ [20]
AI Overviews appear on ~13-15% of US English SERPs	Reported	Search Engine Land [8]
llms.txt adoption is ~7-10% of public SaaS	Sampled	llmstxt.org [10]
Domain Rating explains ~12% of AI citation variance	Sampled	Attrifast aggregate, n=40
Step 4 accounts for 35-45% of visibility lift	Operator inference	Attrifast measurement
ChatGPT RPV vs Google organic for B2B SaaS	Sampled	Attrifast aggregate

Two areas of uncertainty. One: the exact relative weight of step 4 versus step 5 over 24 months. My year-one estimate (35-45 percent vs 25-35 percent) probably inverts by year two as third-party compounding catches up. Two: non-English markets. My data is heavily US English; operators in German, Japanese, and Spanish markets report patterns I cannot confirm.

A few additional pieces worth reading to triangulate this playbook: the Backlinko AI Overviews study [4], the Profound blog [5], the Peec.ai blog [6] for European measurement, the Common Crawl statistics [18], the Search Engine Land AI search library [21], the Anthropic crawler docs [12], the OpenAI bot docs [11], and the which brands does ChatGPT recommend in 2026 measurement piece for category-level benchmarks.

FAQ

How long does it actually take to get ChatGPT to recommend my product?

Honestly, eight to twelve weeks of consistent work, and that is for the live-retrieval surface (ChatGPT search and browse). The deeper training-corpus surface, where ChatGPT recommends you without browsing turned on, runs on OpenAI's retrain cycle and takes one to three quarters. In my own measurement across roughly 40 properties, the first Perplexity citations land in week 3-5, the first ChatGPT search citations in week 5-8, and the unprompted browse-off mentions on a refreshed model can take six months or more. Anybody promising overnight results is selling you something.

Do I need to pay for an AI visibility tool to do this?

Not for step 1. You can audit your current ChatGPT visibility by manually running 20-30 prompts in ChatGPT, Perplexity, Claude, and Gemini and logging the results in a spreadsheet. It is tedious but free. Paid tools like Profound, Peec, Otterly, and Attrifast's prompt tracking automate the weekly run and the diffing. The honest break point: under 30 prompts and a quarterly cadence, do it manually; above that, the labor cost exceeds the tool cost.

What is the single highest-impact step in this playbook?

Step 4 — restructuring your top revenue pages for AI extractability. Across the operator data I have collected, that one move accounts for roughly 35-45 percent of the total visibility lift, because it touches both the live-retrieval re-ranker (better extraction of your passages) and the training-corpus pathway (cleaner ingestion when the next crawl happens). The next-highest impact step is step 5, earning citations on third-party sources like Reddit and trusted listicles, which compounds slowly but durably.

Should I block GPTBot to protect my content from being trained on?

Only if you have a specific legal or licensing reason. For the vast majority of SMB SaaS, ecommerce, and content sites in 2026, blocking GPTBot removes you from future training corpora and slowly erodes the chance ChatGPT recommends you without browsing turned on. The companies that block it are mostly large publishers with content licensing deals. If you want to be recommended, allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in your robots.txt and treat their crawls as free brand-building.

How many prompts should I track per week?

Between 20 and 40 for a focused SaaS. Pick five to ten high-intent buyer prompts (the ones that read like "best tool for X" or "how do I solve Y"), five to ten comparison prompts ("A vs B"), five to ten problem-language prompts ("my X is doing Y, what should I use"), and five general category prompts. Run each across ChatGPT, Perplexity, Claude, and Gemini weekly. Less than that misses real movement, more than that becomes a tracking job nobody maintains.

Will llms.txt actually move the needle?

Modestly. llms.txt is not a ranking signal. It is a curated map of your most LLM-relevant pages that some AI crawlers read when they encounter it. Adoption is still low (~7-10 percent of public SaaS sites in Q1 2026), so the upside is real and the downside is zero. I treat it as a 30-minute investment with a small-but-positive expected value, not as a centerpiece. Step 4 (page restructure) and step 5 (third-party citations) outweigh it by an order of magnitude.

How do Reddit and Wikipedia fit into this playbook?

They sit in step 5 and they are arguably the highest-leverage third-party citation sources, because both are heavily weighted in LLM training corpora. Reddit's content was licensed to Google for AI training (Reuters reported around USD 60M per year), and Wikipedia underpins the entity disambiguation graph every major model uses. A genuine, helpful presence in r/SaaS, r/SEO, r/marketing, plus a clean Wikidata entry, can move your ChatGPT recommendation odds more than ten paid backlinks.

What if my product launched after the model's training cutoff?

Then ChatGPT literally does not know you exist in browse-off mode, and steps 1-9 of this playbook will not change that in the short term. You have two options: lean entirely on the live-retrieval surface (ChatGPT search, Perplexity, Google AI Overviews) by nailing steps 4-7, and wait one to three quarters for the next model retrain to ingest your earned third-party citations and Wikidata entry. The waiting is brutal, but the work in the meantime is exactly what gets you into the next training pass.

How do I disambiguate my brand entity?

Step 8 covers this in detail. Short version: claim and complete profiles on Wikidata, Crunchbase, LinkedIn Company, GitHub Organization (for technical brands), G2 and Capterra (for B2B SaaS), then make sure your homepage Organization JSON-LD lists every one of those URLs as sameAs entries. The goal is that when ChatGPT encounters your brand name, the surrounding context across 5-7 authoritative sources agrees on what you do, who founded you, and what category you sit in. Without that, the model defaults to confusion or omission.

What is the cheapest version of this playbook that still works?

Manual prompt tracking in a spreadsheet (step 1, 2, 9), a free Wikidata edit (step 8), free robots.txt and llms.txt edits (steps 6, 7), and roughly 40 hours of content restructure across your top 10 pages (step 4). That gets you 60-70 percent of the impact at near-zero cash cost. The paid layer (prompt tracking tools, AI visibility platforms, Attrifast for revenue attribution) shaves time and adds the measurement loop, but it does not replace the work.

How do I measure whether any of this actually drove revenue?

Step 10 is the whole reason this article exists. ChatGPT and the other AI engines strip the referer header, and GA4 buckets AI-attributed sessions as Direct or (none). To see whether your GEO work shipped revenue you need server-side first-party attribution that detects AI-engine sources by referrer, behavioral pattern, and known AI user-agents, then joins the session to your Stripe payment via webhook. That is exactly the part Attrifast was built for. Without that join, you are optimizing a metric you cannot prove paid for itself.

Can I skip steps in this playbook?

Steps 1, 2, 3, and 10 are non-skippable, because they are diagnostic and measurement, not execution. You can defer step 6 (llms.txt) and step 8 (entity disambiguation) if you are bandwidth-constrained, but you will pay for it in slower compounding. Steps 4 and 5 are the load-bearing execution steps. Step 9 (weekly monitoring) is what keeps the whole thing honest. The fastest realistic version is steps 1, 2, 3, 4, 5, 9, 10 with a promise to revisit 6, 7, 8 in month two.

Does this work for ecommerce, or only for SaaS?

It works for both, with a different emphasis. SaaS benefits more from step 5 (third-party citations on G2, Capterra, Reddit comparison threads) because category language matters. Ecommerce benefits more from step 4 (page restructure with PriceSpec and Product schema) and step 7 (allowing OAI-SearchBot to crawl product pages) because ChatGPT shopping and similar surfaces lean on classic product-feed signals. The 10 steps are the same; the relative weights shift by category.

What is Attrifast's role in this playbook?

Attrifast is an AI-native analytics platform: it detects AI traffic from ChatGPT, Claude, Gemini, Perplexity, Copilot, and Google AI Overviews, measures your AI search visibility across prompts, and joins those sessions to Stripe payments so you can see which AI recommendations actually drove revenue. Step 1 (audit) and step 9 (monitoring) live in our AI visibility score and prompt tracking features. Step 10 (revenue measurement) is the wedge — nobody else closes the cited-clicked-paid loop because GA4 cannot see AI sources at all.

Start your 7-day trial

AI traffic detection across ChatGPT, Claude, Gemini, Perplexity, Copilot, and Google AI Overviews. Weekly prompt monitoring. Stripe revenue join. USD 15 per month after the trial.

Start free trial →

7-day free trial · $15/mo · cancel anytime

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

7-day free trial · $15/mo · cancel anytime

How to Get Recommended by ChatGPT: A 10-Step Playbook for 2026

The 10-step overview

Why this works in 2026 (the short version)

Step 1: Audit your current ChatGPT visibility

Step 2: Identify the 20-30 buyer prompts you actually need to win

Step 3: Map prompts to existing pages (the gap analysis)

Step 4: Restructure each target page for AI extractability

Step 5: Earn citations on third-party sources

Don't have time to monitor 30 prompts every week?

Step 6: Publish llms.txt and llms-full.txt

Step 7: Open up to AI crawlers (robots.txt)

Step 8: Build entity disambiguation

Step 9: Monitor weekly with prompt tracking

Step 10: Measure traffic and conversion (the revenue close)

Step 10 is where we live

What's actually realistic timeline-wise

Common failure modes (and how to spot them)

What the data says (and where it gets uncertain)

FAQ

How long does it actually take to get ChatGPT to recommend my product?

Do I need to pay for an AI visibility tool to do this?

What is the single highest-impact step in this playbook?

Should I block GPTBot to protect my content from being trained on?

How many prompts should I track per week?

Will llms.txt actually move the needle?

How do Reddit and Wikipedia fit into this playbook?

What if my product launched after the model's training cutoff?

How do I disambiguate my brand entity?

What is the cheapest version of this playbook that still works?

How do I measure whether any of this actually drove revenue?

Can I skip steps in this playbook?

Does this work for ecommerce, or only for SaaS?

What is Attrifast's role in this playbook?

Start your 7-day trial

Related reading

Find revenue hiding in your traffic