One ChatGPT prompt becomes 8 internal searches, 3 fetches, and 2 sessions on your site that GA4 buckets as Direct. A 2026 operator breakdown of how fan-out actually moves through your attribution layer and what to instrument.
Part of the AI Search Hub — browse all 35 AI Search guides.
The single best diagnostic conversation I had this year started with a screenshot of a GA4 channel report and one line: "this Direct row is eating us alive and I think it is ChatGPT." It was. But the deeper finding, the one that took two hours of log-line walking to reach, was that the founder's biggest paying customer that month had landed on his site three separate times in two hours from what was clearly one ChatGPT session. Three different deep pages, three different Direct rows, no referer on any of them. One prompt. Three sessions. One eventual Stripe payment.
That moment was the cleanest illustration I have of why the fan-out matters as an attribution problem, not just as a search-mechanics curiosity. Peec.ai has done excellent work documenting the patterns inside ChatGPT's fan-out behaviour, and their patterns piece and the length-growth analysis are the cleanest public datasets on the engine side. This article is the complementary view from the operator side: what fan-out does to your sessions, your referer logs, your UTM strategy, your channel report, and ultimately your revenue join. It is the version I wished I had when that founder messaged me.
I will walk the mechanics, then the measurement, then the structural implications for how you instrument and write content. Everything below is grounded in the rough 200-site cohort I read every Friday morning and cross-checked against the public datasets Peec, Profound, and the search platforms themselves have published. Cohort numbers are cohort numbers, not industry truth, and I flag where the magnitudes diverge from public benchmarks.
Quick facts
Metric
Value
Source
Sites in cohort
~200
Attrifast cohort, May 2026
Median ChatGPT sub-queries per commercial prompt
4 to 6
Attrifast cohort
Upper-decile sub-queries per prompt
11+
Attrifast cohort
Peec.ai average ChatGPT fan-outs per prompt
2.1
Peec.ai [1]
Peec.ai average Perplexity fan-outs per prompt
1.4
Peec.ai [1]
Peec.ai average Grok fan-outs per prompt
6.8
Peec.ai [1]
Peec.ai average fan-out word length, Oct 2025
~6 words
Peec.ai [2]
Peec.ai average fan-out word length, Jan 2026
~12 words
Peec.ai [2]
Most injected fan-out token
best (~24% of advice prompts)
Peec.ai [1]
Current-year freshness modifier rate
~5% of fan-outs
Peec.ai [1]
% of fan-out sessions with recognisable referer (cohort)
~28%
Attrifast cohort
Fan-out conversion to Stripe payment, B2B SaaS
2.7%
Attrifast cohort
Google organic conversion on same SaaS pages
1.4%
Attrifast cohort
Direct-prompt ChatGPT referral conversion (no fan-out)
2.5%
Attrifast cohort
Average follow-through citation clicks per answer
1.4
Attrifast cohort
Median GA4 under-count factor on fan-out channel
3 to 5x
Attrifast vs GA4 reconcile
Two anchors carry the rest of the article. The first is that fan-out is a documented, growing phenomenon: Peec.ai's 20-million-QFO dataset shows the average word length of a sub-query doubled in a single quarter, and our cohort shows commercial prompts now fanning out into 4 to 6 sub-queries by default. The second is that the attribution layer was not designed for this: GA4's last-non-direct model, its single-session referer logic, and its UTM-first channel grouping all assume one source per visit, and one visit per prompt. A fan-out breaks both assumptions simultaneously.
What a query fan-out actually is
A query fan-out is the model expanding one user prompt into several internal searches before it composes the answer. The user sees one input box and one answer. The engine, between those two surfaces, runs N searches, fetches M documents, and merges everything into the final response. The fan-out is the N searches in the middle.
The simplest way to think about it: when you type "best CRM for solo founders" into ChatGPT, the engine does not run that exact string against its search partner. It rewrites and expands it into a small batch of sub-queries that look something like this:
Fan-out sub-query
Type of expansion
best CRM solo founders 2026
freshness modifier added
top lightweight CRM small business
synonym expansion
Pipedrive vs HubSpot solo founder
named-entity injection
CRM reviews self-employed
review-site intent
solo founder CRM Reddit
forum-source intent
best CRM for indie hackers
community-vocabulary expansion
CRM pricing under 30 per month
qualifier extraction from context
affordable CRM features founder
feature-attribute decomposition
That is one prompt expanding into eight sub-queries. The exact count varies by prompt type and engine version, but the structural pattern is consistent: the engine rewrites and decomposes the user's question into several internal questions, each of which gets its own retrieval pass.
Peec.ai's pattern analysis measured the average across a broad sample at 2.1 sub-queries per prompt for ChatGPT, with the most-injected tokens being best, top, reviews, comparison, tools, software, and features. Their separate length-growth piece showed the average sub-query word length doubling from about 6 words to about 12 words between October 2025 and January 2026 across five tested countries. Both are essential context for what follows.
The diagram above is the core of this article in one image. One prompt fans into eight sub-queries; three of them retrieve a page on your domain; those three retrievals can each produce a citation in the answer; the user clicks a subset of them. GA4 reads the resulting visits as three independent Direct sessions, with no shared identifier and no referer. The challenge for an attribution operator is to reconstruct the bridge in the middle.
Why this is an attribution problem, not just a search problem
Peec, Profound, and the rest of the GEO visibility category measure fan-out from the engine side. They tell you what sub-queries fired and what got cited. That answers half the question. The other half is what happens after the citation is clicked, and that is where the attribution layer breaks.
Three things go wrong simultaneously when a fan-out exit lands on your site.
The referer header is missing. ChatGPT's in-app browser, the mobile client, and the desktop app all strip the referer on outbound clicks. Per OpenAI's published docs, the engine's traffic is mediated by clients that do not preserve referrer chains the way a classic web browser does. Even when chatgpt.com appears as the referer (which happens on a minority of clicks from the web UI), the URL fragment that would identify the originating prompt and sub-query is not exposed.
UTM parameters are not added. ChatGPT does not append a utm_source value to citation URLs. Some community guides recommend that publishers self-tag, but that requires the publisher to anticipate every citation pattern, which is not realistic. So the channel-grouping rules that depend on UTM in GA4 default settings simply do not fire on fan-out exits.
The landing page is rarely the homepage. Citations point to deep URLs because the engine extracts the most answer-shaped passage from your site, which is almost never your root page. So even your behavioural fallbacks (visitor lands on /, visits a few pages, signs up) do not match the fan-out exit pattern.
What happens on a fan-out exit
Why it breaks GA4 default attribution
Referer header stripped
GA4 cannot identify the source, so visit lands in Direct/(none)
No UTM source appended by client
Channel grouping rules do not match, visit lands in Direct
Deep page landing rather than root
Homepage-based heuristics do not trigger
Multiple sessions per prompt
GA4 attributes each session independently, no fan-out join
In-app browser cookies isolated
Cross-session linkage on the same device is lost
Last-non-direct window resets often
Fan-out origin overwritten by subsequent direct visits
If you take only one row from that table, take the last one. GA4's last-non-direct attribution model is the single most damaging default for fan-out tracking, because it resets the origin every time a no-referer visit arrives, and fan-out visits are no-referer visits by definition. So the engineering customer who first arrived via a ChatGPT fan-out, came back five days later via a branded search, signed up the next day on a direct visit, and converted three weeks later, gets attributed to whichever last-non-direct touch survived the 90-day window. The fan-out, which was actually the discovery channel, vanishes.
Not every prompt fans out the same way. The shape of the fan-out depends heavily on intent. The cohort breakdown below is from the rolling 90 days ending May 15, 2026 across the prompts where we could classify intent reliably (about 4,200 prompt-source pairs observed via the OAI-SearchBot User-Agent on cohort sites). Engine averages are the Peec.ai cross-platform figures cross-referenced with our own logs.
Intent type
Example prompt
ChatGPT median sub-queries
Perplexity
Grok
Google AI Mode
Informational
what is RAG
2
1
5
2
Navigational
attrifast pricing
1
1
2
1
Commercial investigation
best CRM for solo founders
5
2
8
4
Comparison
Pipedrive vs HubSpot
4
2
7
3
Transactional
buy Stripe subscription analytics
3
1
6
2
The pattern is consistent across the cohort: commercial-investigation and comparison prompts produce the largest fan-outs, navigational prompts produce the smallest. Grok consistently runs the deepest fan-outs, often deeper than the user might expect, while Perplexity stays tight because it relies on a single high-quality retrieval and ranks candidates within it rather than expanding outward. ChatGPT sits in the middle but has been climbing all year on commercial prompts as the engine has gotten more confident running multi-step research.
The reason this distribution matters is that the prompts that drive purchase decisions fan out the most. If you sell to founders evaluating tools, the queries your buyers ask are exactly the queries the engine fans most aggressively. So the gap between the literal phrase the user typed and the queries actually running against the retriever is widest where revenue lives.
The histogram above is the cohort distribution. Most commercial prompts produce a fan-out between 3 and 10 sub-queries, with the mode at 4 to 5 and a long tail past 11 on multi-part research questions like "what is the best CRM for solo founders and how does it compare to HubSpot on pricing and Reddit reviews." Those compound research prompts are where the engine runs the deepest, and where the attribution surface is most distorted.
How fan-out grew through 2025 and 2026
Peec.ai's country-level analysis is the cleanest public dataset on fan-out length over time. They sampled 20 million ChatGPT QFOs between October 2025 and January 2026 across Germany, the UK, Singapore, Thailand, and the US, and they found the average word count per QFO roughly doubled in that single quarter, climbing from about 6 words to about 12 with a peak around 16 in week 49 of 2025. Critically, the QFO count per prompt held steady at 2.3 to 2.8 in their dataset across countries. So the engine was asking longer, more specific questions, not necessarily more of them.
Our cohort sees a complementary trend at the sub-query count level on commercial prompts: the median climbed from about 3 sub-queries in Q3 2025 to 4 to 5 by Q1 2026, with the upper decile rising from 7 to 11. So between Peec.ai's word-length data and our cohort's count data, the fan-out has been getting both wider and deeper through the trailing nine months.
Quarter
Median fan-out word length (Peec)
Median commercial sub-queries (Attrifast)
Upper-decile sub-queries (Attrifast)
Q3 2025
~6 words
~3
~7
Q4 2025
~10 words
~4
~9
Q1 2026
~12 words
~4 to 5
~11
Q2 2026 (partial)
~13 words
~5 to 6
~12
The growth was not driven by one country or language. Peec.ai noted that all five tested countries showed virtually identical doubling curves, which they read (and I agree) as evidence of a global engine-side architectural shift rather than localised experimentation. The most likely drivers are improved vector-retrieval performance on longer queries and a countermeasure against AI-generated content saturation, which makes shallow keyword matches less informative.
The line above is the curve Peec.ai documented, with our cohort's extrapolation tacked on through May 2026. The peak in week 49 (late November 2025) was likely a brief experimentation phase that settled lower; the steady-state through Q1 and Q2 2026 has been in the 12 to 13 word range. For content teams, the takeaway is that the average sub-query is now longer than most page titles and most H1 tags, which means matching on a single short keyword is increasingly mismatched to the queries the retrieval layer actually runs.
What ChatGPT actually injects into your sub-queries
The injection pattern matters because it determines which pages on your site get retrieved during fan-out. Peec.ai's published rankings of the most-injected tokens are the cleanest public data on this. I have re-ordered their list against my cohort's observation about which injections appear most often on the commercial-investigation prompts that drive revenue.
Injected modifier
Frequency on advice prompts (Peec)
Effect on retrieval
What this means for your content
best
~24%
Pulls listicles and ranking pages
Listicles dominate commercial fan-outs
top
~15% to 18%
Similar to best, slightly more category-skewed
Same as above
reviews
~12% to 15%
Pulls G2, Glassdoor, Sitejabber, Trustpilot
Third-party review pages are co-cited
comparison
~8% to 12%
Pulls X vs Y pages and matrix content
Comparison pages are co-cited
tools
~6% to 10%
Pulls category-listing pages
"Tools" pages outrank product pages
software
~6% to 9%
Software-specific filter on tools fan-out
Software listicles dominate B2B
features
~4% to 8%
Pulls feature-decomposition content
Feature-level pages get co-cited
current year (2026)
~5%
Freshness filter
Fresh content gets weighted
Reddit
~3% to 6%
Forum-source intent
Reddit threads enter the citation set
brand name X
varies
Named-entity injection
Named competitors get co-cited
The single most actionable pattern in this table is that best appears in roughly a quarter of advice-style fan-outs even when the user did not type it. So a page that ranks for "CRM for solo founders" but is invisible to "best CRM for solo founders" is going to lose most of the fan-out citations on the prompt the user actually typed. The same applies to comparison and reviews modifiers.
The referer story: what actually shows up in your logs
Now we move from the engine side to your server logs. This is the section that matters most for an attribution operator, because the question is no longer "what did the engine search for" but "what arrived at my domain and how do I read it." Across the cohort, here is what the referer landscape looks like for fan-out exits during May 2026.
Source pattern
Share of fan-out sessions
What GA4 calls it by default
chatgpt.com referer
~22%
Referral (chatgpt)
chat.openai.com referer
~3%
Referral (chat.openai.com)
oai.com referer
~2%
Direct (unless you map it)
utm_source=chatgpt or similar
~1%
Identified (if mapping is set up)
No referer, deep page landing
~58%
Direct/(none)
No referer, homepage landing
~10%
Direct/(none)
chatgpt.com referer but to homepage
~4%
Referral (chatgpt)
The first four rows are the visible fraction. They total about 28 percent of fan-out sessions, which is consistent with the broader how much traffic comes from ChatGPT benchmark. The remaining 72 percent are the invisible fraction, and they break down further by behavioural signature.
Invisible pattern signature
Share
Recoverable server-side?
Deep page, no referer, US/EU geo, business hours
~32%
Yes, behavioural fingerprinting
Deep page, no referer, mobile UA, in-app browser hint
~14%
Partial, UA pattern recognition
Deep page, no referer, ChatGPT desktop app pattern
~8%
Yes, UA + OS handoff signature
Homepage, no referer, but recent OAI-SearchBot fetch
~6%
Yes, fetch-then-human pair join
No referer, no signature, ambiguous
~12%
No, treat as Direct
About 60 of the 72 percent invisible fraction is recoverable with server-side enrichment. The last 12 percent is genuinely ambiguous and stays in Direct. The recovery rate matters because it changes the headline channel share dramatically. A site reading its GA4 dashboard sees roughly 28 percent of true fan-out traffic. The same site with server-side capture and the OAI-SearchBot pair-join logic sees roughly 88 percent. That gap is the entire reason this article ends where it does.
For a step-by-step walk-through of the capture logic at the request level, the track ChatGPT traffic page documents the runtime, and the AI visibility score feature page covers how the per-engine and per-prompt signal is surfaced inside the dashboard.
How fan-out distorts UTM strategy
UTM tagging has been the SEO operator's default lever for a decade. You append utm_source to a link, GA4 picks it up, the visit lands in the right channel. Fan-out breaks this in three ways.
UTM assumption
What fan-out does to it
Source is added by the linker, not the visitor
Fan-out citations are added by the engine, with no UTM appended
One source per click
A single prompt produces multiple clicks from different sub-queries
Source survives the click
The ChatGPT client may strip the URL fragment on follow-through
Source matches the user's intent
The sub-query that produced the citation is not the user's typed prompt
The fix is not "add more UTM." The fix is to acknowledge that UTM is a publisher-side instrument and fan-out is a non-publisher-side phenomenon. The publisher (you) cannot tag what the engine retrieves, because the engine retrieves your existing URL with no parameters. So the recovery logic has to happen on the server, on first visit, before any client-side analytics runs.
A useful mental model is that you have three layers of attribution clarity for fan-out sessions:
Layer
What it catches
Effort
Layer 1: GA4 default
The 28% visible fraction
None
Layer 2: GA4 channel grouping with regex
Same 28%, but routed to a clean channel
Low
Layer 3: Server-side referer capture
Recovers about 60% of the dark fraction
Moderate
Layer 4: Bot-fetch + human-click pair join
Recovers another 6 to 10%
High
Layer 5: First-touch persistence to Stripe
Joins fan-out origin to paying customer
Moderate
Most teams stop at Layer 1 and assume the channel is small. The teams that move to Layer 3 see roughly a 3 to 5x increase in measured ChatGPT volume overnight. The teams that move to Layer 5 are the ones who can answer the question "what fraction of our MRR is from fan-out citations" with anything other than a guess.
Conversion rate of fan-out sessions
The most interesting cohort finding is that fan-out sessions, once you can actually measure them, convert at materially higher rates than Google organic on the same landing pages. Across the cohort's 118 B2B SaaS sites in May 2026, the comparison looks like this.
Source
Sessions
Conversion to Stripe payment
First-month subscription value
Google organic
~1.42M
1.4%
$28.70
ChatGPT direct prompt (no fan-out)
~98k
2.5%
$43.10
ChatGPT fan-out exit
~74k
2.7%
$44.80
Perplexity citation
~52k
2.6%
$42.20
Direct/(none) catch-all (truly direct)
~340k
3.4%
$51.10
Branded search
~620k
4.2%
$58.40
The fan-out row is the second-best non-branded source in the table, behind only true direct type-ins (which are usually returning users and brand-driven). It outperforms Google organic by roughly 1.9x on conversion rate and by roughly 56 percent on first-month value. The reason is intent: a user who clicks a fan-out citation is already in active-evaluation mode, because the engine ran a commercial-investigation fan-out before producing the answer. So the click that arrives on your page is from a user who has been comparing options, possibly across multiple sub-queries, and is engaged enough to click through.
The structural lesson is that fan-out is a high-intent, under-counted channel. Every operator I have talked to who instruments it for the first time has the same reaction: the channel is much bigger and much more valuable than they thought. The combination of under-counting and over-converting is what makes the gap so material on a Stripe revenue join.
For the broader picture on AI-traffic conversion benchmarks across the cohort, the AI traffic revenue benchmark piece runs the full attribution-by-source matrix. The slice above is the fan-out-specific row.
Cohort gap: GA4 versus first-party on fan-out
This is the table I show every operator who asks me whether their measurement gap is real or paranoid. It is the side-by-side of what GA4 reports versus what the server-side capture sees, on the same sites in the same window.
Metric
GA4 default
Server-side capture
Multiple
Fan-out attributed sessions (median cohort site)
412
1,475
3.6x
Fan-out conversion to Stripe (median cohort site)
1.1%
2.7%
2.5x
Fan-out revenue share of total MRR
0.4%
2.1%
5.3x
First-touch persistence past 30 days
18%
76%
4.2x
% of fan-out sessions correctly labeled
28%
88%
3.1x
That last row is the headline. GA4 correctly labels 28 percent of fan-out sessions. Server-side capture correctly labels 88 percent. The 60-percentage-point gap is the channel's measurement debt. It is the reason the founder I mentioned at the top of this article was watching his Direct row grow without understanding why, and why his actual ChatGPT revenue contribution was roughly 5x larger than his dashboard said it was.
What to instrument on the server side
This is the practical section. The capture stack has three layers, and you need all three to get to the 88 percent recovery rate above.
Layer 1: capture the Referer header before client-side processing
The single biggest gain is putting a small request-handling step at your edge (Vercel middleware, Cloudflare Worker, or a Next.js middleware route) that reads the Referer header on every first-visit request, stores it in a first-party cookie, and writes it to your analytics log before the page renders. This is the layer that catches the 28 percent visible fraction reliably.
Capture target
Pattern
What it labels
chatgpt.com
substring match
ChatGPT
chat.openai.com
substring match
ChatGPT (legacy)
oai.com
substring match
ChatGPT (short link)
perplexity.ai
substring match
Perplexity
copilot.microsoft.com
substring match
Microsoft Copilot
gemini.google.com
substring match
Gemini
aistudio.google.com
substring match
Gemini Studio
claude.ai
substring match
Claude
Layer 2: recognise the no-referer-deep-page pattern as a fan-out signature
The 58 percent of fan-out sessions that arrive with no referer on a deep page are the largest single bucket. Server-side, you can label these probabilistically. The cohort heuristic that gets us to roughly 88 percent recovery is below.
Signal
Weight
Interpretation
No referer
+1
Could be fan-out, type-in, or app handoff
Deep page (not root)
+2
Type-ins rarely land on deep URLs
Recent OAI-SearchBot fetch on same URL (under 30 minutes)
+3
Strong fan-out signal
Modern Chrome/Edge UA, no in-app browser hint
+1
Excludes bot scrapes
Geo and time-of-day cluster matches engaged-user pattern
+1
Excludes accidental hits
Score 5 or above
label
Probable fan-out exit
The OAI-SearchBot pair-join is the highest-signal piece. When OpenAI's documented bot fetches a specific URL on your site, and a real human arrives at that same URL minutes later with no referer, the probability that the human click came from a ChatGPT fan-out is very high. We log both legs and join them with a sliding window. This is the layer that recovers the bulk of the dark fraction.
Layer 3: persist a first-touch identifier through the Stripe webhook
The third layer is the revenue join. Once you have labeled the first-visit source server-side, you need that label to survive every subsequent session for the same visitor until the Stripe payment fires. This is where GA4's last-non-direct model breaks: it overwrites the first-touch source on every Direct visit, which means a fan-out first touch is gone within days.
The cookie-based fix is straightforward in principle (write a first-touch source on visit 1, never overwrite, send it to Stripe metadata on checkout) and operationally fragile in practice because of cookie expiry, cross-device drift, and consent constraints. The server-side fix is more durable: persist the first-touch identifier on your backend, keyed by your own user identifier, and join it to the Stripe customer_id when the webhook fires.
For the deeper version of this join across all marketing channels, the revenue attribution feature page documents the Stripe webhook flow, and the share of voice AI search feature page covers how per-prompt visibility metrics get reconciled against the revenue join.
First, the best modifier matters more than any other content lever. Roughly a quarter of advice-style fan-outs inject the word best. A page that does not earn the best variation of its target query is forfeiting a quarter of its possible citations. This is why listicles dominate AI answers. It is also why a comparison post titled "Best CRM for Solo Founders" tends to outrank a feature post titled "Why Our CRM is the Best Choice" even when the feature post is more detailed.
Second, the named-entity injection demands you mention your competitors. The engine fans into Pipedrive vs HubSpot solo founder even when the user did not type either brand. A page that names Pipedrive, HubSpot, Folk, Attio, and Capsule by name, and answers the comparison sub-queries with on-page passages, wins multiple fan-outs in parallel. A page that names only your own product loses the comparison fan-outs entirely.
Third, freshness modifiers reward recent dates on the page. The current-year modifier appears in about 5 percent of fan-outs, and the engines weight that match. A piece dated 2024 will lose freshness fan-outs to a piece dated 2026 on otherwise identical content. This is why we update the publishedAt and updatedAt fields on cohort sites every quarter on the pages we want to keep winning fan-outs.
The deeper implication is that fan-out optimisation is structurally aligned with breadth, not depth. A page that goes wide across multiple injected sub-queries wins more fan-outs than a page that goes deep on the literal phrase the user typed. This inverts a lot of classic content advice about ranking for one keyword cleanly.
Comparing engines on fan-out behaviour
The fan-out shape differs engine by engine, which means the content that survives ChatGPT's fan-out may not survive Grok's or Gemini AI Mode's. The table below summarises the per-engine pattern.
Engine
Median fan-out per prompt
Most-injected modifier
Source preference
Citation density per answer
ChatGPT (search tool)
2 to 4, commercial 4 to 6
best, top, reviews, comparison
Bing-indexed pages, Reddit
3 to 5
Perplexity
1.4 average
minimal expansion
broad web, Reddit, X
3 to 7
Grok
6.8 average
year, brand, trusted forums
Reddit, Wirecutter, Consumer Reports
5 to 10
Google AI Mode
3 to 5 commercial
comparison, reviews
top-10 organic pages
4 to 7
Claude with web
2 to 3
minimal expansion
authoritative editorial sources
3 to 5
Gemini browse
2 to 4
year, location
Google-indexed pages
3 to 6
The implication is that a page that wins ChatGPT's fan-out (broad best modifier coverage) may not win Grok's (deep brand-and-forum modifier coverage). Multi-engine fan-out optimisation is a real, growing surface, and most teams are not yet thinking about it as a per-engine portfolio question. For the cross-engine view, the share of voice in AI search feature page documents how we track per-engine visibility across the cohort.
The operator playbook for fan-out attribution
Putting the article together: here is the sequence I run for any cohort site that wants to take fan-out seriously as an attribution channel.
The first step is the GA4 channel grouping work. Add custom regex rules for chatgpt.com, chat.openai.com, oai.com, perplexity.ai, copilot.microsoft.com, gemini.google.com, aistudio.google.com, and claude.ai, and route them to clean named channels. This is the how to track ChatGPT traffic in Google Analytics walk-through, which recovers the visible 28 percent fraction cleanly. It is necessary but not sufficient.
The second step is the server-side capture stack. Add a request-handling layer (middleware, edge function, or a tool that does this for you) that reads the Referer header on first visit, stores it server-side, and labels the session. Add the OAI-SearchBot pair-join logic for the no-referer-deep-page bucket. This is where you go from 28 percent recovery to roughly 88 percent recovery.
The third step is the first-touch persistence. Write the labeled source to your backend on visit 1, keyed by your user identifier, and never overwrite it. Send it to Stripe metadata on checkout, or join it to the Stripe customer_id when the webhook fires. This is the revenue-join layer, and it is the one that finally lets you answer "what fraction of our MRR is from ChatGPT fan-out citations."
The fourth step is the multi-engine visibility tracking. Pair the attribution stack with a GEO visibility tool (Peec, Profound, Otterly, or our own AI visibility score) so you can see, per prompt, which sub-queries surfaced your domain. That pairing is the only way to correlate engine-side citation share with site-side revenue, which is the only complete picture of the fan-out channel.
The fifth step is the content side. Write pages that win across multiple injected modifiers, name your competitors by brand, keep dates fresh, and structure passages so they extract cleanly. The AI citations versus backlinks piece walks the structural levers in depth.
The whole sequence is roughly two weeks of engineering work plus a recurring content cadence. The payoff is moving from a 28 percent view of your fan-out channel to an 88 percent view, with a defensible revenue join on top.
Three case studies from the cohort
To make the abstract claims concrete, here are three cohort sites where I have permission to share the shape (not the names) of what fan-out did to their attribution layer through Q1 and Q2 2026.
The first is a developer-tools company selling a CI/CD observability product at $79 per seat per month. Their GA4 dashboard read 0.6 percent of monthly sessions as ChatGPT through April 2026 and they had largely written off the channel. After we put the server-side stack in front of their analytics, the same month read 2.9 percent of sessions as ChatGPT, with 71 percent of those classified as fan-out exits (no referer, deep page, OAI-SearchBot fetch within the prior 30 minutes). The Stripe revenue join surfaced that fan-out exits had produced $14,400 of new MRR in Q1 2026 versus the $2,200 GA4 had attributed to chatgpt.com referrals. The ratio was 6.5x, which is on the high end of the cohort distribution but not unique.
The second is a DTC supplements brand. Their pattern was the opposite shape: GA4 read 1.8 percent of sessions as ChatGPT, and the server-side stack read 3.1 percent, a 1.7x correction rather than a 5x correction. The reason was that their buyers tended to type branded queries (the product name) rather than commercial-investigation queries (best supplement for X). Branded queries fan out less aggressively, so the gap between GA4 and server-side was smaller. The lesson is that fan-out under-counting is intent-dependent, not site-dependent: a brand whose buyers ask commercial-investigation questions has a bigger gap than a brand whose buyers type the product name directly.
The third is a B2B SaaS in the data-infrastructure category. Their dashboard story was the most striking. GA4 said ChatGPT was their seventh-largest channel by sessions and twelfth-largest by attributed revenue. The server-side stack said it was their second-largest channel by sessions and fourth-largest by attributed revenue. The reclassification was driven entirely by the no-referer-deep-page bucket, which had been bleeding into a Direct row that the founder had assumed was returning users. Once the fan-out exits were pulled out of Direct, the true Direct row dropped by about a third and the Direct conversion rate per session went up because the remaining Direct visits were genuinely returning, branded users with much higher purchase intent. So the recovery did not just reclassify volume; it also revealed that the residual Direct channel was a different, cleaner cohort than the dashboard had suggested.
All three cases share a structural property worth naming. The fan-out channel does not feel like a single channel to the operator. It feels like a slow drip of unrelated Direct visits arriving on deep pages, which the operator instinctively reads as either bot traffic or returning-user noise. The act of pulling those visits out, labeling them, and running the revenue join is what turns the drip into a recognisable, sized, optimisable channel. Until that happens, fan-out exists as a measurement debt that grows quietly every quarter as the engines fan out more aggressively.
What I expect to change through 2026 and 2027
A few predictions I am willing to put my name behind, with the standard caveat that AI search is moving fast enough that any prediction past 12 months is a guess. First, fan-out sub-query counts will keep climbing on commercial prompts, but at a slower rate than 2025-to-2026. Most engines are settling into the 3 to 8 range on commercial prompts and I expect that to be the steady state through 2026. Second, fan-out sub-query word length will keep growing as retrievers get better at vector matching on longer queries. Peec.ai's data suggests the curve is slowing but not flattening. Third, more engines will adopt the OAI-SearchBot-style approach of fetching the citation page before the user clicks, which actually helps attribution operators because it provides a clean second-leg signal to pair with the human click. Fourth, the gap between the prompt a user types and the queries the retriever runs will widen, which makes single-keyword content optimisation increasingly mis-targeted relative to the fan-out shape.
The longer-term implication is that the attribution stack you build for fan-out becomes the attribution stack for AI search more broadly. The same referer-capture, pair-join, and first-touch-persistence layers that recover fan-out exits also recover Perplexity citations, Gemini AI Mode clicks, Claude with web search visits, and the next surface that has not been built yet. The fan-out problem is the canary in the coal mine for the broader measurement problem, and the fix is the same fix.
FAQ
What is a ChatGPT query fan-out?
A query fan-out is the set of internal searches the model expands a single user prompt into before it composes the answer. When someone types best CRM for solo founders into ChatGPT, the engine does not run that exact string against its search index. It rewrites and expands it into several sub-queries like best CRM solo founders 2026, top lightweight CRM small business, Pipedrive vs HubSpot solo founder, CRM reviews self-employed, and so on. Each sub-query becomes its own retrieval, and the result sets are merged before the model writes the answer. Across the roughly 200 sites in the Attrifast cohort, the median ChatGPT search-tool prompt now expands into 4 to 6 sub-queries, with the upper decile fanning out past 11.
Why does fan-out matter for attribution and revenue tracking?
Because each sub-query can produce its own retrieval, its own document fetch, and in some cases its own click to your site, but GA4 sees those clicks as unrelated events with no referer. One user prompt can generate 2 or 3 separate sessions on your domain across a few hours as the engine cites different pages on different sub-queries. The standard attribution layer treats those sessions as independent Direct visits because the referer header is stripped by the ChatGPT client, no UTM is appended, and the sessions arrive on deep pages. In our cohort the median site under-counts fan-out sourced traffic by roughly 70 to 80 percent under default GA4 settings.
How many sub-queries does ChatGPT actually run per prompt?
It depends on intent type and on engine version. Peec.ai measured an average of 2.1 fan-outs per ChatGPT prompt across a broad sample, with Perplexity at 1.4 and Grok at 6.8. Our Attrifast cohort sees a higher median because we sample heavily on commercial-investigation prompts where the engine fans out more aggressively. On commercial queries like best X for Y or alternatives to Z, our median is 4 to 6 sub-queries, with a long tail past 11. Peec also documented that the average word count per fan-out roughly doubled between October 2025 and January 2026, climbing from about 6 words to 12.
What does ChatGPT actually search for when it fans out?
It injects modifiers the user did not type. Peec.ai's pattern analysis shows the most-injected tokens are best, top, reviews, comparison, tools, software, and features, with best alone appearing in roughly 24 percent of advice-style fan-outs. The engine also adds the current year as a freshness modifier in around 5 percent of queries, and it routinely appends review-site and forum names like Reddit, G2, Wirecutter, Glassdoor, and Consumer Reports. The mechanical result is that ChatGPT often searches for terms a user would never type, which means your page can rank perfectly for the prompt the user typed and be invisible to the fan-out that actually retrieved the citation.
Do fan-out sub-query visits show up in GA4?
Some of them. Across our cohort, roughly 28 percent of fan-out-sourced sessions arrive with a recognisable referer (chatgpt.com, chat.openai.com, oai.com, or a utm_source value the user-side app added). The remaining 72 percent arrive as Direct or as no-referer landings on a deep URL. The split is worse on mobile, where the in-app browser strips referers more aggressively, and worse still on the desktop ChatGPT app, which routes the click through an OS handoff that drops the header entirely. GA4 cannot distinguish a fan-out-sourced Direct visit from a true Direct type-in without server-side enrichment.
How does fan-out break UTM strategy?
UTM tagging assumes one source per visit. A fan-out prompt produces several searches that can each cite a different page on your site, and the user can click multiple citations in the same answer. The same prompt session emits sessions tagged from at least 3 different paths (assuming the user clicks more than one citation), and in most cases none of those clicks carry a utm_source value because ChatGPT does not append UTM parameters to citation URLs. The practical implication is that you cannot lean on UTM to disambiguate fan-out traffic. You need a server-side referer-capture layer that recognises the chatgpt.com referer (when present) and the no-referer-deep-page entry pattern.
What is a fan-out exit and how do I detect one server-side?
A fan-out exit is the click event when a user follows a citation from inside the ChatGPT answer to a third-party page. Server-side, it usually has three properties: no Referer header (or chatgpt.com when present), a landing URL that is not your homepage (citations point to deep pages), and a User-Agent that matches a real human browser rather than a crawler. The OAI-SearchBot User-Agent fetches the citation page before the user clicks, so the first request is a bot fetch and the second request, seconds or minutes later, is the human follow-through. By logging both legs and joining them on URL and IP-window heuristics, you can attribute the human session back to a fan-out source even without a referer.
Why do fan-outs produce more sessions than the user might expect?
Because a single answer often surfaces 3 to 7 citations, and engaged users click more than one. Our cohort logs an average of 1.4 follow-through clicks per ChatGPT answer where the engine fanned out into 4 or more sub-queries. If two of those clicks land on different domains, you see two sessions in your attribution layer (one to each site) but they originated from one prompt. If two of them land on different deep pages of the same domain, you see two sessions but they should be joined as one fan-out cohort visit. That join is not something GA4 will do for you.
Do fan-out sessions convert?
Yes, and at materially higher rates than Google organic on research-heavy verticals. Across the cohort's B2B SaaS sites in May 2026, fan-out-sourced sessions converted to a Stripe payment at 2.7 percent, versus 1.4 percent for Google organic and 2.5 percent for direct-prompt ChatGPT referrals where the user typed the brand name. The premium reflects intent: fan-out citations surface during commercial-investigation prompts where the user is already comparing options, which is later in the buyer journey than a top-of-funnel Google query. On ecommerce the pattern is more muted because impulse purchases do not match the fan-out intent profile.
How is fan-out different between ChatGPT, Perplexity, Claude, and Gemini?
All four fan out, but the shape is different. Perplexity runs the smallest fan-out (around 1.4 sub-queries per prompt per Peec.ai data) because it leans on a tightly tuned search step. ChatGPT sits at 2 to 4 sub-queries per prompt as a baseline, with commercial prompts pushing past 6. Grok runs the largest fan-out at around 6.8 sub-queries, narrowing aggressively by year, brand, and trusted sources like Reddit and Wirecutter. Claude with web search and Gemini in browse mode sit in the middle. The practical consequence: a page that survives ChatGPT's fan-out shape may be invisible to Grok's, because the injected modifiers differ engine by engine.
What content shape survives multiple fan-out queries?
Pages that answer the prompt at three altitudes: the literal phrase the user typed, the most likely injected modifier (best, top, reviews, comparison), and the named alternatives the user did not type. A best CRM for solo founders article that only ranks for the literal phrase wins one fan-out and loses the rest. The same article that names Pipedrive, HubSpot, Folk, Attio, and Capsule, answers reviews and comparison sub-queries with on-page passages, and carries a recent-year date, wins 4 or 5 of the fan-outs the engine runs in parallel. This is the operator-level reason listicles and comparison posts dominate AI answers.
How do I attribute a single Stripe payment back to a fan-out origin?
Three steps. One, capture the first-visit referer server-side and label it before any client-side processing strips it. Two, recognise the no-referer-deep-page entry pattern as a probable AI fan-out exit and bucket it accordingly. Three, persist a first-touch identifier through every subsequent session for the same visitor and join that identifier to the Stripe customer_id at webhook time. The third step is the one default analytics tools do not do, because GA4 attributes on a last-non-direct basis within a 90-day window that frequently drops the fan-out origin altogether.
Will fan-out length keep growing through 2026 and 2027?
On the trajectory Peec.ai measured (word count per fan-out doubling between October 2025 and January 2026), and with engines moving toward longer-context retrieval, the safest prediction is that fan-out depth keeps rising but at a slower rate. We expect fan-out sub-query counts to stabilise in the 3 to 8 range on commercial prompts through 2026, with sub-query word length continuing to climb as the retrievers improve at vector matching on longer queries. The implication for content teams is that the gap between the prompt a user types and the queries the engine actually runs will keep widening.
Does Attrifast measure fan-out attribution specifically?
Attrifast captures the server-side referer on the first visit, persists a first-touch identifier across sessions, and joins that identifier to the Stripe payment by webhook. That means a fan-out-sourced session that lands on a deep URL with no referer can still be tied back to its eventual paying customer, and the AI-engine origin label is preserved through a sales cycle that GA4 would otherwise reset. Attrifast is not a fan-out visibility tracker (for that, pair it with Peec, Profound, or Otterly), but on the revenue side, the first-party Stripe join is the only durable way to answer the question fan-out raises: did the engine's internal search of my brand actually produce a paying customer.
See your real ChatGPT fan-out revenue, not the 28 percent GA4 reads
Server-side referer capture, OAI-SearchBot pair-join, and a clean Stripe webhook attribution stack. Set up in under 4 minutes. $29 per month.