A field guide to identifying AI traffic sources in 2026. Per-engine referrer, user-agent, and UTM behavior for ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, and Meta AI, plus the server-side detection that catches what GA4 misses.
Part of the AI Search Hub — browse all 35 AI Search guides.
A founder messaged me three weeks ago with one question: "Which of these AI engines is actually sending me money?" His dashboard showed a single row labeled "AI Assistants" with about 4,200 sessions for the month. I asked him to break it by engine. He couldn't. His custom GA4 channel grouped seven AI hostnames into one bucket, and the unreferred majority, about 65% of his actual AI traffic, was still hiding in Direct. He had solved the wrong half of the problem.
The question "how do I track AI traffic sources" sounds singular but it is actually seven smaller questions stacked, one per engine, each with its own referer behavior, User-Agent footprint, UTM situation, and gap GA4 does not fill. Operators who get this right treat it as seven engineering problems.
7 (ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, Meta AI/Grok)
Author classification
Average referer pass-through across all surfaces
~32%
Attrifast aggregate, n=200 sites
GA4 default channels for AI engines
0
Google Analytics docs [1]
Documented AI crawler user-agents
12+ across major engines
OpenAI [2], Anthropic [3], Perplexity [4]
AI crawler share of total bot traffic, late 2025
~5-7% and rising
Cloudflare Radar [5]
Largest single-surface drop in pass-through (ChatGPT web vs iOS)
28% to 8%
Attrifast logs, May 2026
Median time to ship per-engine detection (server-side)
4-8 hrs build + 1-2 hrs/mo upkeep
Author estimate
Engines that publish a clean UTM convention
1 (Perplexity Pro, partial)
Perplexity docs [6]
Behavioral classifier confidence on unreferred AI traffic
75-85% precision in B2B SaaS
Attrifast benchmark [7]
GA4 alternative tools that ship AI-engine detection by default
Revenue attribution category only
Author measurement
The 32% aggregate pass-through is the number that gets misquoted most often. People hear it and assume their analytics is missing about a third of AI traffic. The truth is that the missing share is unevenly distributed, on a ChatGPT-heavy site it is 70%+, on a Perplexity-heavy site it can be under 40%, and on a site with measurable AI Overviews exposure it is functionally everything. The engine mix changes the math more than any other variable.
What an AI traffic source actually is
An AI traffic source is a human visit whose proximate cause is an AI assistant surface: the user read an answer in ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, Meta AI, Grok, or a smaller engine, then clicked through. Not a crawler, not a publisher redirect, a person who arrived because an AI assistant cited or summarized your content.
Identifying that visit requires reconstructing three pieces of information the modern web routinely strips:
Origin: which AI engine produced the click.
Surface: which client (web browser, iOS app, Android app, desktop app, browser sidebar) the user was in.
Intent: whether the click matches typical AI patterns (deep-page entry on a buying-intent query, no prior session, multiple clicks in a short window) or is unreferred direct traffic.
You read all three off the same HTTP request, but you read them from different headers and use them in different ways. The Referer tells you origin and partial surface. The User-Agent tells you surface and bot-vs-human. The Sec-Fetch-Site and Sec-Fetch-Dest headers, part of the Fetch Metadata family the W3C specified and browsers implement [8], tell you whether the navigation was cross-site, which separates AI clicks from internal navigation cleanly. Your own UTM, when you control the link, tells you intent precisely; behavioral inference fills the gap when you don't.
Signal
What it tells you
Limits
Referer header
Originating AI engine when present
Stripped on ~68% of clicks on average
User-Agent
Crawler vs human; rare in-app webview ID
Useless for engine ID on regular browsers
Sec-Fetch-Site
Cross-site vs same-site navigation
Coarse; separates AI from internal, not engine from engine
First-party UTM
Engine ID when you control the link
Only covers links you placed
Behavioral pattern
Probable AI origin on unreferred deep-page entries
Probabilistic, not deterministic
Sec-Fetch-Dest
Document, image, script, etc.
Useful for filtering, not identifying
The thing to internalize is that no single one of those signals catches every AI visit. The Referer catches what the engine and the browser cooperatively pass through, the User-Agent catches the bots and a few odd in-app webviews, the UTM catches what you tagged, and the behavioral classifier catches the unreferred majority probabilistically. A real detection pipeline stacks all of them.
Before we go engine by engine, it helps to lock in the eight mechanisms that strip the Referer on AI clicks, because every engine combines them differently and the same engine produces different results on different surfaces.
Mechanism
What it does
Engines most affected
Referrer-Policy: no-referrer
Source page explicitly blanks the Referer for every outbound link [9]
ChatGPT web on some pages
Referrer-Policy: origin
Sends only the hostname, strips the path, you see chatgpt.com instead of the conversation URL [9]
ChatGPT web (default)
In-app webview (iOS SFSafariViewController)
Opens links in an in-app browser that does not propagate the host app's referer [10]
ChatGPT iOS, Claude iOS, Perplexity iOS
In-app webview (Android Chrome Custom Tabs)
Strips the referer unless the host app explicitly sets it, which most don't [11]
ChatGPT Android, Claude Android, Gemini Android
Desktop app via OS URL handler
Opens system default browser with no document context to source the Referer from
ChatGPT desktop, Claude desktop
rel="noopener noreferrer" on links
The HTML spec forces the browser to drop the Referer on these links [12]
Perplexity, Claude citations
Click-tracker rewrite
Outbound link gets rewritten through an intermediate redirector that strips Referer on the second hop
AI Overviews via Google, Bing Copilot via bing.com/ck
ITP cross-site downgrade
Safari trims the Referer to eTLD+1 on cross-site loads under Intelligent Tracking Prevention [13]
All engines on Safari
The reason you cannot fix this in GA4 settings is that all eight mechanisms operate at or below the browser layer. By the time the request hits your server, the Referer has already been stripped or trimmed. GA4's gtag.js runs even later, inside the page, after the browser has applied every policy, so it sees a strict subset of the original signal. Detection has to happen server-side to recover what the client lost on the way.
ChatGPT: the largest channel, the worst pass-through
ChatGPT is the canonical case study for AI traffic source detection because it combines the largest volume with the worst per-surface pass-through, so every detection mistake you make on ChatGPT compounds the most. About 200 sites I read every Friday confirm a pattern I have been watching for eighteen months: the more ChatGPT-heavy a site's AI traffic mix, the more Direct/(none) eats its analytics. For a deeper volume picture, see how much traffic comes from ChatGPT.
Referer behavior by surface. ChatGPT web sends Referrer-Policy: origin on most pages, which means the Referer is https://chatgpt.com/ with no path, you can attribute the engine but not the conversation. On Chrome the pass-through is around 28%; on Safari, ITP downgrades cross-site Referer to eTLD+1, dropping it to about 19%. The iOS app uses SFSafariViewController, which produces an empty Referer on roughly 92% of clicks. The Android app uses Chrome Custom Tabs with the same problem. The macOS and Windows desktop apps open the system default browser via OS URL handler, with no document context, Referer is essentially zero.
User-Agent. Human ChatGPT traffic arrives with a normal browser UA, the user clicked a citation, their default browser opened, and the UA is whatever they normally use. There is no ChatGPT-User-Browser header or equivalent. The crawlers are a different story: GPTBot (training crawl), ChatGPT-User (live fetch), and OAI-SearchBot (search index) all identify themselves cleanly in the User-Agent per OpenAI's bot documentation [2].
UTM behavior. ChatGPT does not append a UTM to outbound links. The destination URL is your canonical URL unmodified. If you see ?utm_source=chatgpt arriving in your logs, either you placed it on a link in your own llms.txt or another publisher quoted ChatGPT's answer and tagged it manually, the engine itself does not.
ChatGPT surface
Referer present
Referrer-Policy
UTM
Detectable as ChatGPT?
Web (Chrome / Firefox)
~28%
origin
None
Yes when referer survives
Web (Safari)
~19%
origin + ITP downgrade [13]
None
Yes when referer survives
iOS app (SFSafariViewController) [10]
~8%
n/a
None
Behavioral inference only
Android app (Custom Tabs) [11]
~11%
n/a
None
Behavioral inference only
Desktop (macOS / Windows, OS handler)
~6%
n/a
None
Behavioral inference only
The ChatGPT-specific detection pattern I run is: match Referer hostname against chatgpt.com|chat.openai.com|oai.com, treat any non-empty Referer match as confirmed ChatGPT, then for unreferred deep-page entries on a query that targets a topic ChatGPT is citing, flag as probable ChatGPT. The track ChatGPT traffic reference page has the exact regex I use. The behavioral inference is the part that recovers the 70%+ of ChatGPT traffic the referer never sees.
Claude: the engine almost nobody is logging correctly
Claude is the engine most teams underestimate, partly because Anthropic publishes less marketing about its consumer chat surface than OpenAI does about ChatGPT, and partly because Claude's referer pass-through is mid-pack but its bot footprint is among the most distinctive. The track Claude traffic page has the full setup; here is the per-surface matrix.
Referer behavior by surface. Claude.ai web sends Referrer-Policy: strict-origin-when-cross-origin, the modern spec-compliant default [9]. That preserves the origin on HTTPS-to-HTTPS navigations, so you typically see https://claude.ai/chat/<uuid> arrive intact on desktop browsers, with a pass-through around 41%. The chat UUID is useful, it lets you correlate multiple clicks from the same conversation. The iOS Claude app and Android Claude app both strip the referer through the standard in-app webview mechanisms, dropping to single digits. The Claude desktop app, like ChatGPT desktop, opens the OS default browser and loses the referer.
User-Agent. Anthropic documents two crawler user-agents per their support article [3]: ClaudeBot (training crawl) identifies as Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com), and Claude-User (live fetch) fires when a Claude session needs to retrieve a URL on a user's behalf. Both are bots. Human Claude traffic arrives with a normal browser UA.
UTM behavior. Claude does not append UTMs to citation links. Anthropic's product surfaces render your canonical URL unmodified.
Claude surface
Referer present
Referrer-Policy
UTM
Detectable as Claude?
Claude.ai web
~41%
strict-origin-when-cross-origin
None
Yes, with chat UUID
iOS Claude app
~10%
n/a
None
Behavioral inference only
Android Claude app
~12%
n/a
None
Behavioral inference only
Claude desktop (macOS / Windows)
~14%
n/a
None
Behavioral inference only
Claude in Slack (via Claude for Slack)
~5%
n/a (Slack URL preview)
None
Behavioral only; very low volume
The Claude-specific gotcha is that ClaudeBot has been one of the most aggressive AI crawlers across late-2025 and 2026, Cloudflare's AI-bots report consistently puts it in the top three for hit volume on the publishers they sample [14]. Operators who only watch ChatGPT traffic miss that Claude is often citing the page before the ChatGPT crawler is, and Claude-User live fetches are an early leading indicator of citation activity. Log ClaudeBot and Claude-User as separate dimensions in your access-log analysis; the linked piece on AI crawler tracking walks the technique.
Perplexity: the most cooperative engine
Perplexity is the engine where server-side detection feels easiest, which is exactly why teams overweight it and underestimate the others. Perplexity's pass-through is the second-best of the seven majors, and it has the cleanest UTM convention of any of them on its Pro tier. The full reference is the track Perplexity traffic page.
Referer behavior by surface. Perplexity.ai sends Referrer-Policy: strict-origin-when-cross-origin on most pages and preserves the search-slug path, so you typically see https://www.perplexity.ai/search/<slug> arrive with a pass-through around 62% on desktop browsers. The slug is a human-readable encoding of the user's original query, which means the Referer alone tells you both the engine and roughly what the user was searching for. iOS Perplexity drops to about 17% through SFSafariViewController. Android Perplexity drops to about 21% through Custom Tabs. Mac and Windows clients open the system browser.
User-Agent. Per Perplexity's published bot policy [4], there are two crawlers: PerplexityBot (indexing, respects robots.txt) and Perplexity-User (live fetch during a session, documented as not respecting robots.txt because it represents a direct user request). Human Perplexity traffic arrives with a normal browser UA.
UTM behavior. Perplexity Pro citations append ?utm_source=perplexity in some configurations, this is the closest any major engine comes to a published UTM convention [6]. The non-Pro free tier does not consistently append the parameter. So a UTM match is a confirmation signal, not a coverage signal: it tells you when Pro fired but doesn't catch free-tier traffic.
Perplexity surface
Referer present
Referrer-Policy
UTM
Detectable as Perplexity?
Perplexity.ai web
~62%
strict-origin-when-cross-origin
Pro: utm_source=perplexity sometimes
Yes, with search slug
iOS Perplexity app
~17%
n/a
None on free tier
Behavioral inference + partial UTM
Android Perplexity app
~21%
n/a
None on free tier
Behavioral inference + partial UTM
Perplexity Spaces (logged-in)
~58%
strict-origin-when-cross-origin
None
Yes, with space-slug
The Perplexity-specific gotcha is the multi-click pattern. Perplexity's follow-up question UX drives multiple citation clicks across several user-initiated follow-ups, all linking to the same canonical URL on your site. In my logs this looks like one user generating 3-5 hits within a 10-minute window with the same IP and slightly different slug paths. A naive session window treats that as one visitor; raw hit counts overstate by 3-5x. Bucket by IP + UA + 30-minute window before reporting unique Perplexity visitors.
Gemini: two engines stitched together
Gemini is functionally two engines that share a brand, and you have to track them separately or you will conflate clean traffic with untrackable traffic. The chat surface at gemini.google.com passes referers reasonably. The AI Overviews and AI Mode surfaces inside the Google SERP do not, because Google rewrites every outbound click through its own click tracker. The track Gemini traffic page covers both; here is the matrix.
Referer behavior by surface. Gemini.google.com sends Referrer-Policy: strict-origin-when-cross-origin and preserves at least the origin, with a pass-through around 54% on desktop browsers, Google's own properties are consistently cooperative on Referer behavior across Gemini surfaces. The Android Gemini app and iOS Gemini app drop to the typical in-app webview range of 15-25%. AI Overviews inside Google Search render the answer inline, so a large share of users never click anything; the share that does click goes through Google's outbound click tracker, producing a Referer of google.com with no signal that the click came from an AI Overview specifically.
User-Agent. Google publishes Google-Extended as the opt-out token for Bard / Gemini training [15], a separate signal from the regular Googlebot crawler. Human Gemini traffic arrives with a normal browser UA.
UTM behavior. Gemini does not append UTMs to outbound links from the chat surface. AI Overviews clicks pass through Google's click tracker, which adds its own internal parameters but no stable UTM you can match.
Gemini surface
Referer present
Referrer-Policy
UTM
Detectable as Gemini?
Gemini.google.com (chat, web)
~54%
strict-origin-when-cross-origin
None
Yes
Gemini in Search (AI Overviews)
<3%
n/a (click rewritten)
None
Only via Search Console correlation
Gemini iOS app
~21%
n/a
None
Behavioral inference only
Gemini Android app
~24%
n/a
None
Behavioral inference only
Gemini in Google Workspace (sidebar)
~38%
strict-origin-when-cross-origin
None
Yes, when sidebar is active
The Gemini-specific gotcha is that operators frequently see a flat or declining "Gemini" referer count and assume Gemini isn't sending them traffic, when the truth is that the traffic shifted from gemini.google.com to AI Overviews and became untrackable in transit. The diagnostic is to cross-reference Search Console impressions on AI Overview-eligible queries against your deep-page Direct trend; if the impressions are growing while gemini.google.com referers are flat, AI Overviews is the missing source. The Google AI Overviews tracking guide goes deeper.
Google AI Overviews: the engine with no clean per-click attribution
Google AI Overviews deserves its own section because it is the only engine in the seven where there is no clean per-click attribution path in 2026, full stop. The architecture of the surface, inline answer rendering inside the SERP, click rewriting through Google's tracker, no engine-identifying UTM, is what makes it untrackable by design.
Referer behavior. When a user clicks a citation inside an AI Overview, the click is rewritten through Google's outbound click handler. The destination receives a Referer of https://www.google.com/ with no path information, no UTM, and no header that says "this click originated from an AI Overview specifically rather than a regular search result." From your server's perspective, an AI Overview click is indistinguishable from a regular Google organic click. That is not an accident; that is how Google's click tracking has worked for a decade.
User-Agent. Human AI Overview traffic arrives with a normal browser UA, same as any Google organic visitor.
UTM behavior. None. Google does not append a UTM that identifies the answer surface.
Indirect detection. The honest 2026 method is to correlate Search Console data with your analytics. Search Console reports impressions and clicks on AI Overview-eligible queries through its standard reporting [16], and you can identify which of your URLs are appearing in AI Overviews by cross-referencing position data against the queries known to trigger an Overview (which is a moving target, but the major SEO trackers, Search Engine Land's coverage [17] and the AI Overviews tracker work [18], publish trigger-rate datasets). If a URL's impressions are up sharply while its measured clicks are flat (because the AI Overview is satisfying the query inline) and your deep-page Direct entries on that URL are also up, the inferred conclusion is AI Overview activity.
AI Overview surface
Referer present
UTM
Direct detection
Indirect detection
AI Overviews on Google.com (desktop)
~0% as Overview-specific
None
Impossible
Search Console correlation
AI Overviews on Google mobile web
~0% as Overview-specific
None
Impossible
Search Console correlation
AI Overviews in Google app (iOS / Android)
~0% as Overview-specific
None
Impossible
Search Console correlation
AI Mode (the dedicated tab)
~0% as AI Mode-specific
None
Impossible
Search Console + branded-search lift
The blunt summary for AI Overviews: you can size it but you cannot attribute it per-click. Any "AI Overviews revenue" number you see in 2026 is an inference, not a measurement. The companion piece Google AI Mode vs AI Overviews covers the structural difference between the two and why neither is cleanly attributable.
Stop guessing what's hiding in (Direct)
Attrifast detects every AI engine sending you traffic, server-side, with zero GA4 setup. Start your 5-day trial.
Microsoft Copilot is the dark horse of AI traffic source detection in 2026, and the part that surprises operators is that Copilot's Edge sidebar context produces the highest Referer pass-through of any major surface, period, around 71% in my logs. The catch is that Copilot itself is split across multiple surfaces (the Edge sidebar, the bing.com/chat web UI, the Copilot Windows app, Microsoft 365 Copilot inside Word and Excel), and only one of them is that clean. The track Copilot traffic page has the engine reference.
Referer behavior by surface. Copilot in the Edge sidebar sends Referrer-Policy: strict-origin-when-cross-origin and preserves the origin on HTTPS-to-HTTPS navigations, producing https://www.bing.com/ or https://copilot.microsoft.com/ as the Referer at about 71%. The classic Bing Chat web UI goes through bing.com/ck/a?..., a two-hop redirect that blanks the Referer on the second hop, dropping the rate to about 33%. The Copilot Windows app opens the system default browser via OS URL handler and loses the Referer. Microsoft 365 Copilot inside Office apps has a smaller volume and a mix of behaviors depending on the host app.
User-Agent. Bingbot identifies itself in the User-Agent on training and indexing crawl. There is no distinct CopilotBot user-agent that I have seen consistently in logs through 2026; Microsoft's crawling appears to consolidate under the Bingbot identity. Human Copilot traffic arrives with a normal browser UA, with the noteworthy exception that some Edge sidebar contexts include Edg/ in the UA, which is a partial signal that the user is on Edge specifically (and therefore more likely to be a sidebar Copilot user).
UTM behavior. Copilot does not append a UTM to outbound links from any surface I have measured.
Copilot surface
Referer present
Referrer-Policy
UTM
Detectable as Copilot?
Edge sidebar Copilot
~71%
strict-origin-when-cross-origin
None
Yes, highest of any surface
Bing Chat (bing.com/chat)
~33%
302 through bing.com/ck redirector
None
Yes, but referer often hostname-only
Copilot.microsoft.com web
~58%
strict-origin-when-cross-origin
None
Yes
Copilot Windows app
~9%
n/a (OS URL handler)
None
Behavioral inference only
Microsoft 365 Copilot (Word, Excel, etc.)
~22%
varies by host
None
Behavioral inference mostly
The Copilot-specific gotcha is that the "Bing" hostname can show up in your Referrer for two completely different reasons: classic Bing organic search clicks (no AI involved) and Bing Chat / Copilot AI clicks. If you regex-match bing.com and bucket everything into "Microsoft Copilot," you will overcount AI traffic by mixing in regular search clicks. The fix is to match on the specific paths, bing.com/chat, bing.com/copilot, copilot.microsoft.com, and treat bare bing.com as classic organic until proven otherwise.
Meta AI and Grok: the catch-all bucket
Meta AI, Grok, You.com, Phind, Poe, and the long tail together make enough volume on some sites to deserve a slot, but individually none passes the volume threshold for its own section. I bucket them as "other AI" while still detecting each individually in logs.
Referer behavior. Meta.ai web sends https://www.meta.ai/ with pass-through around 25-35% on desktop, much lower on Facebook and Instagram in-app surfaces because those use Meta's webview that strips the Referer aggressively. Grok on grok.com and inside X.com passes a Referer roughly 20-30%; xAI's engineering posts [19] occasionally reference referrer-policy changes that have moved the rate either direction. You.com and Phind behave like Perplexity (strict-origin-when-cross-origin, search-slug paths) at lower volume. Poe (Quora's multi-bot interface) passes Referer 30-45% with a poe.com hostname.
User-Agent. Meta's training crawler identifies as Meta-ExternalAgent and FacebookBot in some configurations; Meta publishes a partial crawler list [20]. Grok's training crawler has been variously identified as Grokipedia and xAI-prefixed strings in late 2025 / 2026, moving fast enough that I won't pin a specific UA in writing.
UTM behavior. None of them appends a standardized UTM to outbound citation links as of May 2026.
Engine
Surface
Referer present
UTM
Detectable?
Meta AI (meta.ai web)
Web
~28%
None
Yes when Referer survives
Meta AI (in Facebook / Instagram app)
In-app webview
~7%
None
Behavioral only
Grok (grok.com)
Web
~24%
None
Yes when Referer survives
Grok (in X.com)
Web sub-surface
~31%
None
Yes; referer = x.com
You.com
Web
~47%
None
Yes
Phind
Web
~52%
None
Yes
Poe
Web
~38%
None
Yes
The catch-all bucket separates a real per-engine setup from a half-built one. The first build covers ChatGPT, Perplexity, Claude. The second adds Gemini and Copilot. The third, which most teams never get to, picks up Meta AI, Grok, You.com, Phind, and Poe. Those engines collectively run 5-15% of total AI traffic on the sites I see, small enough to ignore and big enough that ignoring it inflates the "AI is mostly ChatGPT" misread.
The cross-engine reference matrix
Pull the seven per-engine sections together and the operator-grade reference is a single matrix: every engine, every surface, every signal. This is the table I keep on a wall above my desk.
Engine
Surface
Referer pass-through
UA distinct?
UTM standard
Sec-Fetch-Site useful?
ChatGPT
Web (Chrome)
~28%
No (bots only)
None
Yes, cross-site
ChatGPT
iOS
~8%
No
None
Limited (in-app context)
ChatGPT
Android
~11%
No
None
Limited
Claude
Web
~41%
No (bots only)
None
Yes
Claude
iOS
~10%
No
None
Limited
Claude
Android
~12%
No
None
Limited
Perplexity
Web
~62%
No (bots only)
Pro: partial
Yes
Perplexity
iOS
~17%
No
None on free
Limited
Perplexity
Android
~21%
No
None on free
Limited
Gemini
Chat web
~54%
No
None
Yes
Gemini
AI Overviews
<3%
No
None
No (click rewrite)
Gemini
iOS / Android
~22%
No
None
Limited
AI Overviews
Google.com
~0%
No
None
No
Copilot
Edge sidebar
~71%
Partial (Edg/ UA)
None
Yes
Copilot
Bing Chat
~33%
No
None
Partial
Copilot
Windows app
~9%
No
None
Limited
Meta AI
meta.ai web
~28%
No
None
Yes
Meta AI
in-app
~7%
No
None
Limited
Grok
grok.com
~24%
No
None
Yes
Grok
x.com
~31%
No
None
Yes
Read across the rows and the pattern is consistent: every engine has a "clean" desktop browser surface that passes the Referer at a usable rate and a "messy" mobile or app surface that strips it. The clean surfaces are detectable with a referer match. The messy surfaces need behavioral inference. There is no single signal that catches all surfaces of any engine, and there is no engine where the mobile surface is cleaner than the desktop surface.
How GA4 buckets each engine
If you are running GA4 with default settings and have not built a custom channel grouping, here is the row each engine produces in your channel report, and how it differs from what an AI-aware tool reports.
Engine
Referer present
GA4 default channel
GA4 default source
Reality
ChatGPT
Yes
Referral
chatgpt.com / chat.openai.com
AI Assistant, ChatGPT
ChatGPT
No (stripped)
Direct
(none)
Dark AI traffic, ChatGPT
Claude
Yes
Referral
claude.ai
AI Assistant, Claude
Claude
No
Direct
(none)
Dark AI traffic, Claude
Perplexity
Yes
Referral
perplexity.ai
AI Assistant, Perplexity
Perplexity
No
Direct
(none)
Dark AI traffic, Perplexity
Gemini (chat)
Yes
Referral
gemini.google.com
AI Assistant, Gemini
Gemini (chat)
No
Direct
(none)
Dark AI traffic, Gemini
Gemini (AI Overviews)
"google.com" only
Organic Search
google / organic
Misattributed: AI Overview, not classic organic
Copilot (Edge sidebar)
Yes
Referral
bing.com or copilot.microsoft.com
AI Assistant, Copilot
Copilot (Bing Chat)
Sometimes
Referral or Direct
bing.com or (none)
AI Assistant, Copilot
Meta AI
Yes
Referral
meta.ai
AI Assistant, Meta
Meta AI
No
Direct
(none)
Dark AI traffic, Meta
Grok
Yes
Referral
grok.com or x.com
AI Assistant, Grok
Two rows cause the most confusion. The AI Overviews row gets misattributed to "Organic Search" rather than Direct, because the click came through google.com with the Referer intact, inflating your classic organic numbers. The Direct rows for the other engines inflate your Direct bucket. Both are misclassification failures into different buckets. The longer write-up is in GA4 missing traffic.
Detection method comparison: referer vs UA vs UTM vs behavioral
You have four signals available for identifying AI traffic, and each one catches a different population. The honest comparison is which signal is sufficient on its own (none) and which combination approaches full coverage (all four, stacked).
Detection method
What it catches
What it misses
False positive rate
Maintenance burden
Referer-only
Engine when Referer survives
Stripped Referer (majority on most engines)
Near-zero
Low; engine domain list
User-Agent-only
Crawler traffic precisely; almost no human traffic
All human visits
Low for bots
Low; UA list updates
UTM-only
Traffic from links you tagged
Organic AI citations
Near-zero
Low; tag discipline
Behavioral-only
Probabilistic unreferred AI traffic
Referred AI (use Referer instead)
10-20% on tuned classifier
High; rules drift
Referer + UA
Visible AI plus clean bot separation
Unreferred AI
Near-zero
Moderate
All four stacked
85-95% of AI traffic, labeled by engine
Inherently untrackable (AI Overviews)
5-15% on inferred slice
High (or buy a tool)
The non-obvious takeaway is that the User-Agent is mostly useless for human attribution but indispensable for bot separation. If you skip the UA filter, your Referer-matched "AI traffic" row will include every GPTBot, ClaudeBot, and PerplexityBot hit on your site, the same crawlers that produced the citation in the first place, and you will overcount human AI traffic by an order of magnitude. The UA is what tells you a hit is a person, not a crawler. Then the Referer tells you which engine the person came from. The two work as a pair.
The other non-obvious takeaway is that the behavioral classifier is the only signal that recovers the unreferred majority, and it is the only signal that requires real engineering ongoing, because the behavioral signature of an AI-referred unreferred visit changes as the engines change their mobile and app behavior. A classifier tuned in Q1 2026 might be 10 points less accurate by Q3 2026 if Meta AI's in-app browser shifts how it renders citations. This is the maintenance argument for buying versus building.
Server-side vs client-side detection
The architecture you choose for AI traffic detection, server-side versus client-side, is the single biggest decision you make on this work, because it sets the ceiling on what you can recover. Client-side detection caps out around 30-50% capture on most sites. Server-side caps out around 75-95%. The gap is mechanical, not implementation-dependent. Here is why.
Capability
Server-side
Client-side (gtag, plausible-tag, etc.)
Reads raw Referer header
Yes
No (only sees document.referrer after policy stripping)
Reads Sec-Fetch-Site
Yes
No (no JS API for request headers)
Reads User-Agent
Yes
Yes, but trivially spoofable client-side
Blocked by ad blockers
No
Yes, partially (~20-40% in some audiences)
Affected by consent declines
No (if no PII stored)
Yes (analytics tag fires post-consent)
Can run a behavioral classifier
Yes, with full session context
Limited; classifier runs after page load
Maintenance: engine domain list
You own it
You own it
Maintenance: classifier rules
You own it
You own it
Performance impact on page
None
Small JS payload
Real-time dashboard updates
Yes
Yes
GDPR posture
Configurable (in-region processing)
Depends on tag vendor
The mechanism behind the capture-rate gap is exactly the eight stripping mechanisms from earlier. Client-side detection sees document.referrer, which is what the page sees after Referrer-Policy has been applied to the incoming navigation. Server-side detection sees the raw Referer header on the inbound request, before any client-side stripping. For an engine like ChatGPT that sends Referrer-Policy: origin, both see the same hostname-only Referer; for an engine that sends Referrer-Policy: strict-origin-when-cross-origin, both see the same origin; but for clicks that go through an in-app webview or a click-tracker rewrite, the server has access to Sec-Fetch-Site and the User-Agent that the client tag does not expose [8]. That access is what lets a server-side classifier separate AI cross-site clicks from internal navigation.
The decision rule I use with founders is: if your AI traffic share is under 5% of total sessions, the client-side cap is probably fine, you are sizing the channel, not optimizing it. If AI is 5-15% of sessions, server-side is worth the engineering. If AI is over 15%, you are operating the channel actively and the capture-rate gap is the difference between accurate weekly reporting and noise; this is the band where a maintained tool or a dedicated server-side build pays for itself fastest. The full architecture argument is in the AI traffic analytics 2026 guide.
A working timeline: when each engine started passing referrers
The reason any of this is a moving problem rather than a one-time configuration is that the engines change their referer behavior on their own schedules. Here is the timeline of major changes I have logged across the seven majors since each engine launched a public consumer chat surface. The dates are publication dates of the change, when I can pin them down; otherwise they are the month I first observed the change in my logs.
The chart maps a real operational pattern: every twelve months, on average, at least one major engine changes its surface or referrer behavior enough that a regex-only detection rule built more than a year ago is partly broken. The teams who notice this are the ones who keep a weekly check that compares server-log AI referrer counts against the analytics tool's "AI" channel, if those numbers drift apart by more than 20%, something changed and the rule needs updating. The teams who don't notice are the ones whose AI channel quietly goes dark in Q3 and they discover it in Q4 board prep.
The point of showing the timeline visually is not to memorize the dates. It is to internalize that this is a moving problem. No single rule shipped in 2024 still catches what 2026 traffic looks like. The maintenance burden on a hand-rolled detection setup is real, ongoing, and almost always underestimated.
Tool comparison: what each platform does with AI traffic
The other axis of the build-versus-buy decision is which existing tool catches which engine, and the truth is that the popular analytics platforms split into clear camps on this. Here is the honest scoring across the four signals, referer match, UA match, UTM honoring, and behavioral inference, for the tools founders ask me about most often.
Tool
Referer match (AI engines)
UA-based crawler separation
UTM honoring
Behavioral inference (unreferred)
GA4 (default config)
None, buckets as Referral
Bot filter on/off only
Standard
None
GA4 (custom channel grouping)
Manual regex per engine
Bot filter on/off only
Standard
None
Plausible
Shows referer hostname; no AI label
Configurable bot filter
Standard
None
Fathom
Shows referer hostname; no AI label
Configurable bot filter
Standard
None
Simple Analytics
Shows referer hostname; no AI label
Configurable bot filter
Standard
None
Pirsch
Some AI labeling on referer match
UA-based bot filter
Standard
None
Matomo
Manual mapping
UA-based bot filter
Standard
None
Attrifast
All 7 engines, maintained list
UA-based, separate crawler dashboard
Standard
Yes, behavioral classifier on unreferred
Two things on this table don't fit in the cell text. First, "shows referer hostname" is not the same as "labels as the engine." Plausible, Fathom, and Simple Analytics will show you chatgpt.com or perplexity.ai as a referrer source, identical to what GA4's Referral channel shows, but none rolls those hostnames into an "AI Assistants" category by default. The deeper Attrifast vs Plausible and Attrifast vs Pirsch comparison pages walk this row-by-row.
Second, the behavioral inference column is the structural separator. Every tool can do a Referer match against an AI domain list with enough configuration; it is just regex. None of the general-purpose dashboards runs a behavioral classifier on unreferred deep-page entries. That is the architectural choice distinguishing a revenue-attribution tool from a privacy-analytics tool from GA4.
Tool category
Solves
Doesn't solve
Best for
GA4 stock
Generic web analytics
AI detection, GDPR fully
Teams already on GA4 with no AI focus
GA4 + custom regex
Referer-passing AI slice
Unreferred majority, GDPR
Teams committed to GA4 staying
Privacy analytics (Plausible/Fathom)
GDPR, clean pageviews
AI labeling, revenue join
Banner-free privacy dashboards
Server logs + grep
Bot visibility, free
Human attribution, scale
Engineers auditing crawl behavior
Revenue attribution (Attrifast)
AI detection + revenue
General funnel exploration
Teams measuring AI revenue
The honest read across these tables: no tool is best on every axis. GA4 is free with Google Ads integration. Plausible and Fathom are beautiful for pageviews and EU-compliant. Server logs are unbeatable for crawler visibility. Attrifast is built around AI detection plus the revenue join. Pick by which question you are answering, not which tool is most "modern."
The architectural punchline
AI traffic source detection in 2026 is a server-side multi-signal problem, not a client-side single-rule one. Teams who get this right run a detector reading the Referer, User-Agent, Sec-Fetch-Site, and any incoming UTM, match against a maintained per-engine domain list, then run a behavioral classifier on what's left. Teams who don't ship a single GA4 custom channel grouping with a seven-hostname regex, declare victory, and miss 50-70% of traffic for the next year.
The build cost is roughly 8-12 hours of initial work plus 1-2 hours of monthly maintenance as engines change. The buy cost is $29/mo for a tool that absorbs the maintenance. Whichever direction you go, do not run client-side-only and act like it is full coverage. The capture-rate gap is the difference between good decisions and noise.
Identify each engine by Referer when it survives, separate humans from crawlers with the User-Agent, recover the unreferred majority with a behavioral classifier, and stop trusting any single signal in isolation. The seven-engine row hiding inside your Direct bucket becomes seven legible rows you can act on.
Turn (Direct) into seven legible rows
Per-engine AI traffic detection, server-side, joined to Stripe revenue. Five-day free trial, $29/mo after.
A human visit whose immediate origin is an AI assistant surface: ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, Meta AI, Grok, or a smaller engine. You identify it by combining the Referer header (when it survives), the User-Agent string (mostly for crawler separation), a first-party UTM you control, and a behavioral inference for the unreferred deep-page entries. No single signal catches every AI visit.
Which AI engines pass a Referer header in 2026?
All seven majors pass a Referer in some surfaces and strip it in others. Desktop hierarchy: Microsoft Copilot (Edge sidebar) around 65-75%, Perplexity 55-65%, Gemini chat 45-55%, Claude 30-45%, ChatGPT web 18-28%, Meta AI and Grok 10-25%, and AI Overviews near zero because Google rewrites the click through its own tracker. Mobile and desktop-app surfaces strip the referer dramatically more across every engine.
Why is ChatGPT mobile traffic almost always missing a referer?
iOS ChatGPT opens links inside SFSafariViewController, which does not propagate the host app's Referer. Android ChatGPT uses Chrome Custom Tabs, which strips the referer unless the host app explicitly sets it. The desktop ChatGPT app opens system default browser via OS URL handler, so the new tab has no document context to source the Referer from. All three mechanisms eliminate the referer before your server sees it.
Can a User-Agent string identify an AI traffic source?
Only for crawlers and a small slice of in-app webviews, never for bulk human AI traffic. GPTBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, OAI-SearchBot, and Bingbot all identify themselves cleanly. But humans reading AI answers click in a normal browser tab whose UA looks like every other Chrome or Safari session. UA is for bot separation, not human attribution.
Does GA4 detect any AI engines by default in 2026?
No, none. GA4's default grouping has 17 channels and none of them is "AI Assistants." When a referer survives, GA4 buckets the visit into Referral with the engine's hostname as the source but does not label it as AI. When stripped, the visit lands in Direct / (none). A custom channel grouping with regex rules catches the referer-passing slice; the unreferred majority needs server-side enrichment that GA4 cannot do.
What is the difference between PerplexityBot and Perplexity-User?
Two distinct UAs in Perplexity's published bot policy. PerplexityBot is the indexing crawler that builds the index and respects robots.txt. Perplexity-User is the live-fetch agent that runs when a user asks a question and Perplexity needs to retrieve your page in real time. Perplexity-User does not respect robots.txt because it represents a direct user request, same pattern as ChatGPT-User and Claude-User. Treat them as separate dimensions in your logs.
How do I detect Google AI Overviews traffic if the referer is stripped?
You mostly cannot detect it directly. AI Overviews renders inline so most users never click anything, and clicks that do happen go through Google's tracker with a referer of google.com that gives no AI Overview-specific signal. The indirect detection is correlating Search Console impressions on AI Overview-eligible queries with deep-page Direct entries on those same URLs. There is no clean per-click attribution path for AI Overviews in 2026.
What does it mean to detect AI traffic "server-side"?
Your origin server, edge worker, or analytics endpoint reads the Referer header, User-Agent, and Sec-Fetch-Site values on the inbound HTTP request before any client-side JavaScript runs. A client-side tag like gtag.js executes after the page loads and only sees document.referrer, a subset of the real Referer. Server-side, you see the raw header and can layer behavioral signals like deep-page entry on a buying-intent query.
What's the difference between client-side and server-side AI detection?
Client-side reads document.referrer after Referrer-Policy stripping, runs inside ad-blocker range, and cannot see Sec-Fetch-Site. Server-side reads the raw Referer before any stripping, has access to all Fetch Metadata headers, and is invisible to blockers. The capture rate gap is roughly 2-3x. The cost of server-side is owning the engine list and the behavioral classifier; the benefit is the gap.
Can I use UTM parameters to identify AI traffic?
Only for traffic where you control the link: your llms.txt, your sameAs URLs, your own forum posts, and Perplexity Pro citations that auto-append utm_source=perplexity in some configurations. You cannot UTM-tag a link that ChatGPT, Claude, Gemini, or Meta AI generates for you, because the engine renders your canonical URL with no parameters appended. UTM is a floor, not a ceiling.
How do I tell ChatGPT traffic apart from Perplexity from Claude?
Three signals stacked. Referer hostname when it survives: chatgpt.com for ChatGPT, perplexity.ai for Perplexity, claude.ai for Claude. URL path on the referer when present: ChatGPT strips the path, Perplexity preserves the search-slug, Claude preserves the chat UUID. For unreferred cases, behavioral fingerprinting on landing-page pattern: Perplexity skews to comparison pages, ChatGPT to how-to and definitional, Claude to technical and developer-tool content.
Are the AI bots in my server logs the same as AI traffic?
No, completely different audiences. AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bingbot, CCBot) are crawlers that identify themselves in the User-Agent and respect (most of them) robots.txt. AI traffic is human visitors who clicked through after reading an AI answer. They live in different tables and should never be aggregated together. Counting them together overstates AI traffic by 5-10x.
How long does it take to ship per-engine AI traffic detection?
Plan on 4-8 hours for the first cut: a referer-matching function with the seven engine domain lists, a UA filter to strip crawlers, a Sec-Fetch-Site enrichment, and a basic landing-page-pattern heuristic for unreferred entries. Add 2-4 hours to wire output into your warehouse. Then 1-2 hours per month of upkeep as engines ship client updates that shift pass-through rates and new engines launch every 6-8 weeks.
Does Attrifast handle all this automatically?
Yes. The script detects every major AI engine server-side against a maintained domain list, layers in a behavioral classifier for the unreferred majority, separates bot traffic from human traffic, and labels each session by engine. It also joins to Stripe revenue, so you see which engine produced the paying customer. $29/mo, no third-party cookie, no consent banner in most jurisdictions, engine list maintained by us.
Identify every AI engine sending you traffic
Server-side detection across the seven majors, behavioral inference for the unreferred majority, and a Stripe join for revenue. Five-day free trial.