How do I tell ChatGPT traffic apart from Perplexity traffic from Claude traffic?

With three signals stacked. First, the Referer hostname when it survives: chatgpt.com or chat.openai.com for ChatGPT, perplexity.ai for Perplexity, claude.ai for Claude, these are unambiguous. Second, the URL path on the referer when present: ChatGPT sends an origin-only policy that strips the path, Perplexity preserves the search-slug path, Claude preserves the chat UUID. Third, for the unreferred cases, behavioral fingerprinting on the landing-page pattern and query intent, Perplexity users disproportionately land on comparison and 'best X' pages, ChatGPT users disproportionately land on definitional and how-to pages, Claude users disproportionately land on technical and developer-tool pages. The behavioral signal is probabilistic, but combined with the referer slice it gives you a defensible per-engine split.

Blog / Analytics

How to Track AI Traffic Sources: The 2026 Operator Playbook

Q: What is an AI traffic source, technically?

An AI traffic source is any visit whose immediate origin is an AI assistant surface, ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews, Microsoft Copilot, Meta AI, Grok, or a smaller engine like You.com or Phind. The traffic is human, not crawler, and the visit is the result of a user reading an AI answer that cited or linked to your page. Technically you identify it by some combination of three signals: the Referer header (if it survives), the User-Agent string (only useful for the small share of cases where the request originates from an in-app webview that identifies itself), and a first-party UTM parameter that you control. None of those three signals alone identifies every AI visit; you need all three plus a behavioral inference for the unreferred deep-page entries to approach full coverage.

Q: Which AI engines pass a Referer header in 2026?

All seven majors pass a Referer in some configurations and strip it in others. The hierarchy on desktop browsers, from most cooperative to least, runs roughly Microsoft Copilot (Edge sidebar) at about 65-75% pass-through, Perplexity around 55-65%, Gemini in the chat surface around 45-55%, Claude.ai around 30-45%, ChatGPT web around 18-28%, Meta AI and Grok around 10-25%, and Google AI Overviews near zero because the click is rewritten through Google's own click tracker. Mobile and desktop-app surfaces strip the referer dramatically more than desktop browsers across every engine. The single number 'ChatGPT pass-through' or 'Perplexity pass-through' is meaningless without specifying the surface.

Q: Why is ChatGPT mobile traffic almost always missing a referer?

Two reasons stack. First, iOS ChatGPT opens links inside SFSafariViewController, which is an in-app Safari instance that does not propagate the host app's Referer header, Apple's documentation is explicit on this. Second, Android ChatGPT uses Chrome Custom Tabs, which strips the referer unless the host app explicitly sets it, and OpenAI's Android client does not. The desktop ChatGPT app has the same problem from a different mechanism: it opens links via the OS URL handler, so the system default browser receives the click with no document context to source a referer from. The web app on a regular browser is the only surface that preserves anything, and even there ChatGPT sends Referrer-Policy: origin, which strips the path.

Q: Can a User-Agent string identify an AI traffic source?

Only for crawlers and a small slice of in-app webviews, never for the bulk of human AI traffic. GPTBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, OAI-SearchBot, and Bingbot all identify themselves in the User-Agent, so you can log AI crawler activity precisely. But humans reading ChatGPT, Claude, or Gemini answers click links in a normal browser tab whose UA looks like every other Chrome or Safari session. The narrow exceptions are some in-app webview UAs that include the host app name, older Facebook in-app browser, for example, but those are not the dominant AI engines. UA is for bot detection, not human attribution.

Q: Does GA4 detect any AI engines by default in 2026?

No, none. GA4's default channel grouping has 17 channels, and not one of them is 'AI Assistants' or 'AI Search.' When a referer survives from an AI engine, GA4 buckets the visit into Referral with the engine's hostname as the source, chatgpt.com, perplexity.ai, claude.ai, but does not label it as AI. When the referer is stripped (the majority case), the visit lands in Direct / (none) with no signal of AI origin at all. You can build a custom channel grouping with regex rules to catch the referer-passing slice, which recovers 15-50% of AI traffic depending on engine mix, but the unreferred majority needs server-side enrichment or behavioral inference that GA4 cannot do.

Q: What is the difference between PerplexityBot and Perplexity-User?

Two distinct user-agents documented in Perplexity's bot policy. PerplexityBot is the indexing crawler that builds Perplexity's web index, Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot), and it respects robots.txt. Perplexity-User is the live-fetch agent that runs when a user asks Perplexity a question and Perplexity needs to retrieve your page in real time to ground its answer. Perplexity-User is documented as not respecting robots.txt because it represents a direct user request, the same pattern OpenAI uses for ChatGPT-User and Anthropic uses for Claude-User. Treat them differently in your logs: PerplexityBot hits indicate corpus presence; Perplexity-User hits indicate live citation activity.

Q: How do I detect Google AI Overviews traffic if the referer is stripped?

You mostly cannot detect it directly, and that is the honest answer. AI Overviews answers render inline on the Google SERP, so most users get the answer without ever clicking anything. When they do click, Google rewrites the outbound link through its own click tracker, which produces a referer of google.com with no signal that the click came from the AI Overview specifically. The indirect detection is to watch for sustained branded-search lift on queries you know trigger an AI Overview, plus an unexplained jump in deep-page Direct entries on those same pages. Search Console can also report impressions on AI Overview-eligible queries, which lets you correlate volume but not click attribution. There is no clean per-click attribution path for AI Overviews in 2026.

Q: What does it mean to detect AI traffic 'server-side'?

Server-side detection means your origin server, edge worker, or analytics endpoint reads the Referer header, User-Agent, and Sec-Fetch-Site values on the inbound HTTP request before any client-side JavaScript runs. This matters because a client-side tag like GA4's gtag.js executes after the page loads and cannot read the request headers that delivered the document, it can only see document.referrer, which is a subset of the real Referer and is degraded by Referrer-Policy on the same trip. Server-side, you see the raw header, you can match it against a maintained AI-engine domain list, and you can layer behavioral signals like deep-page entry on a buying-intent query with no prior session. This is the architectural difference between catching 15-50% of AI traffic and catching 75-95%.

Q: What's the difference between client-side and server-side AI detection?

Client-side detection reads document.referrer from inside a JavaScript snippet running in the page, which means it only sees what the browser exposes after applying Referrer-Policy, in-app sanitization, and any meta-referrer tags on the source page. It runs in the user's browser, can be blocked by ad blockers or consent declines, and cannot see the Sec-Fetch-Site header that browsers send on every navigation. Server-side detection runs on your origin or edge, reads the raw Referer header before any client-side stripping, has access to the full set of Fetch Metadata headers, and is invisible to ad blockers. The cost of server-side is owning the engine domain list and the behavioral classifier; the benefit is roughly 2-3x the recovery rate.

Q: Can I use UTM parameters to identify AI traffic?

Only for traffic where you control the link, your llms.txt file, your structured-data sameAs URLs, your own forum posts, and Perplexity Pro citations that auto-append ?utm_source=perplexity in some configurations. You cannot UTM-tag a link that ChatGPT, Claude, Gemini, or Meta AI generates for you, because the engine renders your canonical URL with no parameters appended. So UTM is useful as a floor, not a ceiling: it catches the share of AI clicks that come through links you placed, and confirms the mechanism is firing, but it does not capture the organic citations the engines produce on their own. The right move is to UTM-tag what you own and use server-side detection for the rest.

32 min readUpdated May 2026

Vincent RuanFounder, Attrifast · May 26, 2026 · 32 min read

A field guide to identifying AI traffic sources in 2026. Per-engine referrer, user-agent, and UTM behavior for ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, and Meta AI, plus the server-side detection that catches what GA4 misses.

Part of the AI Search Hub — browse all 35 AI Search guides.

TL;DR

Identifying AI traffic sources in 2026 is a three-signal problem, not a one-rule fix. You need the Referer (when it survives), the User-Agent (mostly for bot vs human separation), and a first-party UTM you control, then behavioral inference for the unreferred majority. No single signal catches every engine, and no public 2026 engine passes a referer on every surface.
Across the seven majors I track every week, ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews, Microsoft Copilot, and the Meta AI / Grok bucket, the referer pass-through ranges from about 70% on Copilot's Edge sidebar to functionally zero on Google AI Overviews. The single number "AI referrer rate" is meaningless without specifying the surface (web, iOS, Android, desktop app).
GA4 detects zero of the seven engines by default. A custom channel grouping recovers the referer-passing slice (15-50% of total AI traffic by engine mix). The unreferred majority needs server-side detection with a behavioral classifier, which is the architectural gap between catching half the traffic and catching almost all of it.
The capture-rate gap between client-side and server-side detection is roughly 2x to 3x. Client-side reads document.referrer after Referrer-Policy stripping and runs inside ad-blocker range. Server-side reads the raw Referer plus Sec-Fetch-Site before anything strips them and is invisible to blockers.
AI bots and AI traffic are different audiences. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended hit your server with documented User-Agent strings; humans clicking through AI answers do not. Counting them together overstates AI traffic by 5-10x and is the most common mistake I see in self-built setups.
Attrifast detects all seven engines server-side and labels every session by engine, with no GA4 setup. See the per-engine split. Start free trial

A founder messaged me three weeks ago with one question: "Which of these AI engines is actually sending me money?" His dashboard showed a single row labeled "AI Assistants" with about 4,200 sessions for the month. I asked him to break it by engine. He couldn't. His custom GA4 channel grouped seven AI hostnames into one bucket, and the unreferred majority, about 65% of his actual AI traffic, was still hiding in Direct. He had solved the wrong half of the problem.

The question "how do I track AI traffic sources" sounds singular but it is actually seven smaller questions stacked, one per engine, each with its own referer behavior, User-Agent footprint, UTM situation, and gap GA4 does not fill. Operators who get this right treat it as seven engineering problems.

This piece is the operator-grade reference I wish I had when I started building Attrifast's AI visibility score. It does not duplicate how to track AI traffic without GA4 (the tooling argument) or dark AI traffic in GA4 (the misclassification mechanics). This is the per-engine source-identification reference, with the engine-by-engine matrix.

Where AI clicks land in a typical 2026 attribution stack: about half visible by referer match, the rest split between behavioral inference and Direct unless server-side detection catches them

Quick facts

Metric	Value	Source
AI engines covered in this guide	7 (ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, Meta AI/Grok)	Author classification
Average referer pass-through across all surfaces	~32%	Attrifast aggregate, n=200 sites
GA4 default channels for AI engines	0	Google Analytics docs [1]
Documented AI crawler user-agents	12+ across major engines	OpenAI [2], Anthropic [3], Perplexity [4]
AI crawler share of total bot traffic, late 2025	~5-7% and rising	Cloudflare Radar [5]
Largest single-surface drop in pass-through (ChatGPT web vs iOS)	28% to 8%	Attrifast logs, May 2026
Median time to ship per-engine detection (server-side)	4-8 hrs build + 1-2 hrs/mo upkeep	Author estimate
Engines that publish a clean UTM convention	1 (Perplexity Pro, partial)	Perplexity docs [6]
Behavioral classifier confidence on unreferred AI traffic	75-85% precision in B2B SaaS	Attrifast benchmark [7]
GA4 alternative tools that ship AI-engine detection by default	Revenue attribution category only	Author measurement

The 32% aggregate pass-through is the number that gets misquoted most often. People hear it and assume their analytics is missing about a third of AI traffic. The truth is that the missing share is unevenly distributed, on a ChatGPT-heavy site it is 70%+, on a Perplexity-heavy site it can be under 40%, and on a site with measurable AI Overviews exposure it is functionally everything. The engine mix changes the math more than any other variable.

What an AI traffic source actually is

An AI traffic source is a human visit whose proximate cause is an AI assistant surface: the user read an answer in ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, Meta AI, Grok, or a smaller engine, then clicked through. Not a crawler, not a publisher redirect, a person who arrived because an AI assistant cited or summarized your content.

Identifying that visit requires reconstructing three pieces of information the modern web routinely strips:

Origin: which AI engine produced the click.
Surface: which client (web browser, iOS app, Android app, desktop app, browser sidebar) the user was in.
Intent: whether the click matches typical AI patterns (deep-page entry on a buying-intent query, no prior session, multiple clicks in a short window) or is unreferred direct traffic.

You read all three off the same HTTP request, but you read them from different headers and use them in different ways. The Referer tells you origin and partial surface. The User-Agent tells you surface and bot-vs-human. The Sec-Fetch-Site and Sec-Fetch-Dest headers, part of the Fetch Metadata family the W3C specified and browsers implement [8], tell you whether the navigation was cross-site, which separates AI clicks from internal navigation cleanly. Your own UTM, when you control the link, tells you intent precisely; behavioral inference fills the gap when you don't.

Signal	What it tells you	Limits
Referer header	Originating AI engine when present	Stripped on ~68% of clicks on average
User-Agent	Crawler vs human; rare in-app webview ID	Useless for engine ID on regular browsers
Sec-Fetch-Site	Cross-site vs same-site navigation	Coarse; separates AI from internal, not engine from engine
First-party UTM	Engine ID when you control the link	Only covers links you placed
Behavioral pattern	Probable AI origin on unreferred deep-page entries	Probabilistic, not deterministic
Sec-Fetch-Dest	Document, image, script, etc.	Useful for filtering, not identifying

The thing to internalize is that no single one of those signals catches every AI visit. The Referer catches what the engine and the browser cooperatively pass through, the User-Agent catches the bots and a few odd in-app webviews, the UTM catches what you tagged, and the behavioral classifier catches the unreferred majority probabilistically. A real detection pipeline stacks all of them.

The companion post ChatGPT referral traffic not showing in analytics is the diagnostic walk-through if you suspect specific engine traffic is going missing; this article is the broader reference.

Why the referer disappears, mechanically

Before we go engine by engine, it helps to lock in the eight mechanisms that strip the Referer on AI clicks, because every engine combines them differently and the same engine produces different results on different surfaces.

Mechanism	What it does	Engines most affected
Referrer-Policy: no-referrer	Source page explicitly blanks the Referer for every outbound link [9]	ChatGPT web on some pages
Referrer-Policy: origin	Sends only the hostname, strips the path, you see chatgpt.com instead of the conversation URL [9]	ChatGPT web (default)
In-app webview (iOS SFSafariViewController)	Opens links in an in-app browser that does not propagate the host app's referer [10]	ChatGPT iOS, Claude iOS, Perplexity iOS
In-app webview (Android Chrome Custom Tabs)	Strips the referer unless the host app explicitly sets it, which most don't [11]	ChatGPT Android, Claude Android, Gemini Android
Desktop app via OS URL handler	Opens system default browser with no document context to source the Referer from	ChatGPT desktop, Claude desktop
rel="noopener noreferrer" on links	The HTML spec forces the browser to drop the Referer on these links [12]	Perplexity, Claude citations
Click-tracker rewrite	Outbound link gets rewritten through an intermediate redirector that strips Referer on the second hop	AI Overviews via Google, Bing Copilot via bing.com/ck
ITP cross-site downgrade	Safari trims the Referer to eTLD+1 on cross-site loads under Intelligent Tracking Prevention [13]	All engines on Safari

The reason you cannot fix this in GA4 settings is that all eight mechanisms operate at or below the browser layer. By the time the request hits your server, the Referer has already been stripped or trimmed. GA4's gtag.js runs even later, inside the page, after the browser has applied every policy, so it sees a strict subset of the original signal. Detection has to happen server-side to recover what the client lost on the way.

ChatGPT: the largest channel, the worst pass-through

ChatGPT is the canonical case study for AI traffic source detection because it combines the largest volume with the worst per-surface pass-through, so every detection mistake you make on ChatGPT compounds the most. About 200 sites I read every Friday confirm a pattern I have been watching for eighteen months: the more ChatGPT-heavy a site's AI traffic mix, the more Direct/(none) eats its analytics. For a deeper volume picture, see how much traffic comes from ChatGPT.

Referer behavior by surface. ChatGPT web sends Referrer-Policy: origin on most pages, which means the Referer is https://chatgpt.com/ with no path, you can attribute the engine but not the conversation. On Chrome the pass-through is around 28%; on Safari, ITP downgrades cross-site Referer to eTLD+1, dropping it to about 19%. The iOS app uses SFSafariViewController, which produces an empty Referer on roughly 92% of clicks. The Android app uses Chrome Custom Tabs with the same problem. The macOS and Windows desktop apps open the system default browser via OS URL handler, with no document context, Referer is essentially zero.

User-Agent. Human ChatGPT traffic arrives with a normal browser UA, the user clicked a citation, their default browser opened, and the UA is whatever they normally use. There is no ChatGPT-User-Browser header or equivalent. The crawlers are a different story: GPTBot (training crawl), ChatGPT-User (live fetch), and OAI-SearchBot (search index) all identify themselves cleanly in the User-Agent per OpenAI's bot documentation [2].

UTM behavior. ChatGPT does not append a UTM to outbound links. The destination URL is your canonical URL unmodified. If you see ?utm_source=chatgpt arriving in your logs, either you placed it on a link in your own llms.txt or another publisher quoted ChatGPT's answer and tagged it manually, the engine itself does not.

ChatGPT surface	Referer present	Referrer-Policy	UTM	Detectable as ChatGPT?
Web (Chrome / Firefox)	~28%	origin	None	Yes when referer survives
Web (Safari)	~19%	origin + ITP downgrade [13]	None	Yes when referer survives
iOS app (SFSafariViewController) [10]	~8%	n/a	None	Behavioral inference only
Android app (Custom Tabs) [11]	~11%	n/a	None	Behavioral inference only
Desktop (macOS / Windows, OS handler)	~6%	n/a	None	Behavioral inference only

The ChatGPT-specific detection pattern I run is: match Referer hostname against chatgpt.com|chat.openai.com|oai.com, treat any non-empty Referer match as confirmed ChatGPT, then for unreferred deep-page entries on a query that targets a topic ChatGPT is citing, flag as probable ChatGPT. The track ChatGPT traffic reference page has the exact regex I use. The behavioral inference is the part that recovers the 70%+ of ChatGPT traffic the referer never sees.

Claude: the engine almost nobody is logging correctly

Claude is the engine most teams underestimate, partly because Anthropic publishes less marketing about its consumer chat surface than OpenAI does about ChatGPT, and partly because Claude's referer pass-through is mid-pack but its bot footprint is among the most distinctive. The track Claude traffic page has the full setup; here is the per-surface matrix.

Referer behavior by surface. Claude.ai web sends Referrer-Policy: strict-origin-when-cross-origin, the modern spec-compliant default [9]. That preserves the origin on HTTPS-to-HTTPS navigations, so you typically see https://claude.ai/chat/<uuid> arrive intact on desktop browsers, with a pass-through around 41%. The chat UUID is useful, it lets you correlate multiple clicks from the same conversation. The iOS Claude app and Android Claude app both strip the referer through the standard in-app webview mechanisms, dropping to single digits. The Claude desktop app, like ChatGPT desktop, opens the OS default browser and loses the referer.

User-Agent. Anthropic documents two crawler user-agents per their support article [3]: ClaudeBot (training crawl) identifies as Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com), and Claude-User (live fetch) fires when a Claude session needs to retrieve a URL on a user's behalf. Both are bots. Human Claude traffic arrives with a normal browser UA.

UTM behavior. Claude does not append UTMs to citation links. Anthropic's product surfaces render your canonical URL unmodified.

Claude surface	Referer present	Referrer-Policy	UTM	Detectable as Claude?
Claude.ai web	~41%	strict-origin-when-cross-origin	None	Yes, with chat UUID
iOS Claude app	~10%	n/a	None	Behavioral inference only
Android Claude app	~12%	n/a	None	Behavioral inference only
Claude desktop (macOS / Windows)	~14%	n/a	None	Behavioral inference only
Claude in Slack (via Claude for Slack)	~5%	n/a (Slack URL preview)	None	Behavioral only; very low volume

The Claude-specific gotcha is that ClaudeBot has been one of the most aggressive AI crawlers across late-2025 and 2026, Cloudflare's AI-bots report consistently puts it in the top three for hit volume on the publishers they sample [14]. Operators who only watch ChatGPT traffic miss that Claude is often citing the page before the ChatGPT crawler is, and Claude-User live fetches are an early leading indicator of citation activity. Log ClaudeBot and Claude-User as separate dimensions in your access-log analysis; the linked piece on AI crawler tracking walks the technique.

Perplexity: the most cooperative engine

Perplexity is the engine where server-side detection feels easiest, which is exactly why teams overweight it and underestimate the others. Perplexity's pass-through is the second-best of the seven majors, and it has the cleanest UTM convention of any of them on its Pro tier. The full reference is the track Perplexity traffic page.

Referer behavior by surface. Perplexity.ai sends Referrer-Policy: strict-origin-when-cross-origin on most pages and preserves the search-slug path, so you typically see https://www.perplexity.ai/search/<slug> arrive with a pass-through around 62% on desktop browsers. The slug is a human-readable encoding of the user's original query, which means the Referer alone tells you both the engine and roughly what the user was searching for. iOS Perplexity drops to about 17% through SFSafariViewController. Android Perplexity drops to about 21% through Custom Tabs. Mac and Windows clients open the system browser.

User-Agent. Per Perplexity's published bot policy [4], there are two crawlers: PerplexityBot (indexing, respects robots.txt) and Perplexity-User (live fetch during a session, documented as not respecting robots.txt because it represents a direct user request). Human Perplexity traffic arrives with a normal browser UA.

UTM behavior. Perplexity Pro citations append ?utm_source=perplexity in some configurations, this is the closest any major engine comes to a published UTM convention [6]. The non-Pro free tier does not consistently append the parameter. So a UTM match is a confirmation signal, not a coverage signal: it tells you when Pro fired but doesn't catch free-tier traffic.

Perplexity surface	Referer present	Referrer-Policy	UTM	Detectable as Perplexity?
Perplexity.ai web	~62%	strict-origin-when-cross-origin	Pro: utm_source=perplexity sometimes	Yes, with search slug
iOS Perplexity app	~17%	n/a	None on free tier	Behavioral inference + partial UTM
Android Perplexity app	~21%	n/a	None on free tier	Behavioral inference + partial UTM
Perplexity Spaces (logged-in)	~58%	strict-origin-when-cross-origin	None	Yes, with space-slug

The Perplexity-specific gotcha is the multi-click pattern. Perplexity's follow-up question UX drives multiple citation clicks across several user-initiated follow-ups, all linking to the same canonical URL on your site. In my logs this looks like one user generating 3-5 hits within a 10-minute window with the same IP and slightly different slug paths. A naive session window treats that as one visitor; raw hit counts overstate by 3-5x. Bucket by IP + UA + 30-minute window before reporting unique Perplexity visitors.

Gemini: two engines stitched together

Gemini is functionally two engines that share a brand, and you have to track them separately or you will conflate clean traffic with untrackable traffic. The chat surface at gemini.google.com passes referers reasonably. The AI Overviews and AI Mode surfaces inside the Google SERP do not, because Google rewrites every outbound click through its own click tracker. The track Gemini traffic page covers both; here is the matrix.

Referer behavior by surface. Gemini.google.com sends Referrer-Policy: strict-origin-when-cross-origin and preserves at least the origin, with a pass-through around 54% on desktop browsers, Google's own properties are consistently cooperative on Referer behavior across Gemini surfaces. The Android Gemini app and iOS Gemini app drop to the typical in-app webview range of 15-25%. AI Overviews inside Google Search render the answer inline, so a large share of users never click anything; the share that does click goes through Google's outbound click tracker, producing a Referer of google.com with no signal that the click came from an AI Overview specifically.

User-Agent. Google publishes Google-Extended as the opt-out token for Bard / Gemini training [15], a separate signal from the regular Googlebot crawler. Human Gemini traffic arrives with a normal browser UA.

UTM behavior. Gemini does not append UTMs to outbound links from the chat surface. AI Overviews clicks pass through Google's click tracker, which adds its own internal parameters but no stable UTM you can match.

Gemini surface	Referer present	Referrer-Policy	UTM	Detectable as Gemini?
Gemini.google.com (chat, web)	~54%	strict-origin-when-cross-origin	None	Yes
Gemini in Search (AI Overviews)	<3%	n/a (click rewritten)	None	Only via Search Console correlation
Gemini iOS app	~21%	n/a	None	Behavioral inference only
Gemini Android app	~24%	n/a	None	Behavioral inference only
Gemini in Google Workspace (sidebar)	~38%	strict-origin-when-cross-origin	None	Yes, when sidebar is active

The Gemini-specific gotcha is that operators frequently see a flat or declining "Gemini" referer count and assume Gemini isn't sending them traffic, when the truth is that the traffic shifted from gemini.google.com to AI Overviews and became untrackable in transit. The diagnostic is to cross-reference Search Console impressions on AI Overview-eligible queries against your deep-page Direct trend; if the impressions are growing while gemini.google.com referers are flat, AI Overviews is the missing source. The Google AI Overviews tracking guide goes deeper.

Google AI Overviews: the engine with no clean per-click attribution

Google AI Overviews deserves its own section because it is the only engine in the seven where there is no clean per-click attribution path in 2026, full stop. The architecture of the surface, inline answer rendering inside the SERP, click rewriting through Google's tracker, no engine-identifying UTM, is what makes it untrackable by design.

Referer behavior. When a user clicks a citation inside an AI Overview, the click is rewritten through Google's outbound click handler. The destination receives a Referer of https://www.google.com/ with no path information, no UTM, and no header that says "this click originated from an AI Overview specifically rather than a regular search result." From your server's perspective, an AI Overview click is indistinguishable from a regular Google organic click. That is not an accident; that is how Google's click tracking has worked for a decade.

User-Agent. Human AI Overview traffic arrives with a normal browser UA, same as any Google organic visitor.

UTM behavior. None. Google does not append a UTM that identifies the answer surface.

Indirect detection. The honest 2026 method is to correlate Search Console data with your analytics. Search Console reports impressions and clicks on AI Overview-eligible queries through its standard reporting [16], and you can identify which of your URLs are appearing in AI Overviews by cross-referencing position data against the queries known to trigger an Overview (which is a moving target, but the major SEO trackers, Search Engine Land's coverage [17] and the AI Overviews tracker work [18], publish trigger-rate datasets). If a URL's impressions are up sharply while its measured clicks are flat (because the AI Overview is satisfying the query inline) and your deep-page Direct entries on that URL are also up, the inferred conclusion is AI Overview activity.

AI Overview surface	Referer present	UTM	Direct detection	Indirect detection
AI Overviews on Google.com (desktop)	~0% as Overview-specific	None	Impossible	Search Console correlation
AI Overviews on Google mobile web	~0% as Overview-specific	None	Impossible	Search Console correlation
AI Overviews in Google app (iOS / Android)	~0% as Overview-specific	None	Impossible	Search Console correlation
AI Mode (the dedicated tab)	~0% as AI Mode-specific	None	Impossible	Search Console + branded-search lift

The blunt summary for AI Overviews: you can size it but you cannot attribute it per-click. Any "AI Overviews revenue" number you see in 2026 is an inference, not a measurement. The companion piece Google AI Mode vs AI Overviews covers the structural difference between the two and why neither is cleanly attributable.

Stop guessing what's hiding in (Direct)

Attrifast detects every AI engine sending you traffic, server-side, with zero GA4 setup. Start your 7-day trial.

Start free trial →

7-day free trial · $15/mo · cancel anytime

Microsoft Copilot: the surprisingly clean engine

Microsoft Copilot is the dark horse of AI traffic source detection in 2026, and the part that surprises operators is that Copilot's Edge sidebar context produces the highest Referer pass-through of any major surface, period, around 71% in my logs. The catch is that Copilot itself is split across multiple surfaces (the Edge sidebar, the bing.com/chat web UI, the Copilot Windows app, Microsoft 365 Copilot inside Word and Excel), and only one of them is that clean. The track Copilot traffic page has the engine reference.

Referer behavior by surface. Copilot in the Edge sidebar sends Referrer-Policy: strict-origin-when-cross-origin and preserves the origin on HTTPS-to-HTTPS navigations, producing https://www.bing.com/ or https://copilot.microsoft.com/ as the Referer at about 71%. The classic Bing Chat web UI goes through bing.com/ck/a?..., a two-hop redirect that blanks the Referer on the second hop, dropping the rate to about 33%. The Copilot Windows app opens the system default browser via OS URL handler and loses the Referer. Microsoft 365 Copilot inside Office apps has a smaller volume and a mix of behaviors depending on the host app.

User-Agent. Bingbot identifies itself in the User-Agent on training and indexing crawl. There is no distinct CopilotBot user-agent that I have seen consistently in logs through 2026; Microsoft's crawling appears to consolidate under the Bingbot identity. Human Copilot traffic arrives with a normal browser UA, with the noteworthy exception that some Edge sidebar contexts include Edg/ in the UA, which is a partial signal that the user is on Edge specifically (and therefore more likely to be a sidebar Copilot user).

UTM behavior. Copilot does not append a UTM to outbound links from any surface I have measured.

Copilot surface	Referer present	Referrer-Policy	UTM	Detectable as Copilot?
Edge sidebar Copilot	~71%	strict-origin-when-cross-origin	None	Yes, highest of any surface
Bing Chat (bing.com/chat)	~33%	302 through bing.com/ck redirector	None	Yes, but referer often hostname-only
Copilot.microsoft.com web	~58%	strict-origin-when-cross-origin	None	Yes
Copilot Windows app	~9%	n/a (OS URL handler)	None	Behavioral inference only
Microsoft 365 Copilot (Word, Excel, etc.)	~22%	varies by host	None	Behavioral inference mostly

The Copilot-specific gotcha is that the "Bing" hostname can show up in your Referrer for two completely different reasons: classic Bing organic search clicks (no AI involved) and Bing Chat / Copilot AI clicks. If you regex-match bing.com and bucket everything into "Microsoft Copilot," you will overcount AI traffic by mixing in regular search clicks. The fix is to match on the specific paths, bing.com/chat, bing.com/copilot, copilot.microsoft.com, and treat bare bing.com as classic organic until proven otherwise.

Meta AI and Grok: the catch-all bucket

Meta AI, Grok, You.com, Phind, Poe, and the long tail together make enough volume on some sites to deserve a slot, but individually none passes the volume threshold for its own section. I bucket them as "other AI" while still detecting each individually in logs.

Referer behavior. Meta.ai web sends https://www.meta.ai/ with pass-through around 25-35% on desktop, much lower on Facebook and Instagram in-app surfaces because those use Meta's webview that strips the Referer aggressively. Grok on grok.com and inside X.com passes a Referer roughly 20-30%; xAI's engineering posts [19] occasionally reference referrer-policy changes that have moved the rate either direction. You.com and Phind behave like Perplexity (strict-origin-when-cross-origin, search-slug paths) at lower volume. Poe (Quora's multi-bot interface) passes Referer 30-45% with a poe.com hostname.

User-Agent. Meta's training crawler identifies as Meta-ExternalAgent and FacebookBot in some configurations; Meta publishes a partial crawler list [20]. Grok's training crawler has been variously identified as Grokipedia and xAI-prefixed strings in late 2025 / 2026, moving fast enough that I won't pin a specific UA in writing.

UTM behavior. None of them appends a standardized UTM to outbound citation links as of May 2026.

Engine	Surface	Referer present	UTM	Detectable?
Meta AI (meta.ai web)	Web	~28%	None	Yes when Referer survives
Meta AI (in Facebook / Instagram app)	In-app webview	~7%	None	Behavioral only
Grok (grok.com)	Web	~24%	None	Yes when Referer survives
Grok (in X.com)	Web sub-surface	~31%	None	Yes; referer = x.com
You.com	Web	~47%	None	Yes
Phind	Web	~52%	None	Yes
Poe	Web	~38%	None	Yes

The catch-all bucket separates a real per-engine setup from a half-built one. The first build covers ChatGPT, Perplexity, Claude. The second adds Gemini and Copilot. The third, which most teams never get to, picks up Meta AI, Grok, You.com, Phind, and Poe. Those engines collectively run 5-15% of total AI traffic on the sites I see, small enough to ignore and big enough that ignoring it inflates the "AI is mostly ChatGPT" misread.

The cross-engine reference matrix

Pull the seven per-engine sections together and the operator-grade reference is a single matrix: every engine, every surface, every signal. This is the table I keep on a wall above my desk.

Engine	Surface	Referer pass-through	UA distinct?	UTM standard	Sec-Fetch-Site useful?
ChatGPT	Web (Chrome)	~28%	No (bots only)	None	Yes, cross-site
ChatGPT	iOS	~8%	No	None	Limited (in-app context)
ChatGPT	Android	~11%	No	None	Limited
Claude	Web	~41%	No (bots only)	None	Yes
Claude	iOS	~10%	No	None	Limited
Claude	Android	~12%	No	None	Limited
Perplexity	Web	~62%	No (bots only)	Pro: partial	Yes
Perplexity	iOS	~17%	No	None on free	Limited
Perplexity	Android	~21%	No	None on free	Limited
Gemini	Chat web	~54%	No	None	Yes
Gemini	AI Overviews	<3%	No	None	No (click rewrite)
Gemini	iOS / Android	~22%	No	None	Limited
AI Overviews	Google.com	~0%	No	None	No
Copilot	Edge sidebar	~71%	Partial (Edg/ UA)	None	Yes
Copilot	Bing Chat	~33%	No	None	Partial
Copilot	Windows app	~9%	No	None	Limited
Meta AI	meta.ai web	~28%	No	None	Yes
Meta AI	in-app	~7%	No	None	Limited
Grok	grok.com	~24%	No	None	Yes
Grok	x.com	~31%	No	None	Yes

Read across the rows and the pattern is consistent: every engine has a "clean" desktop browser surface that passes the Referer at a usable rate and a "messy" mobile or app surface that strips it. The clean surfaces are detectable with a referer match. The messy surfaces need behavioral inference. There is no single signal that catches all surfaces of any engine, and there is no engine where the mobile surface is cleaner than the desktop surface.

How GA4 buckets each engine

If you are running GA4 with default settings and have not built a custom channel grouping, here is the row each engine produces in your channel report, and how it differs from what an AI-aware tool reports.

Engine	Referer present	GA4 default channel	GA4 default source	Reality
ChatGPT	Yes	Referral	chatgpt.com / chat.openai.com	AI Assistant, ChatGPT
ChatGPT	No (stripped)	Direct	(none)	Dark AI traffic, ChatGPT
Claude	Yes	Referral	claude.ai	AI Assistant, Claude
Claude	No	Direct	(none)	Dark AI traffic, Claude
Perplexity	Yes	Referral	perplexity.ai	AI Assistant, Perplexity
Perplexity	No	Direct	(none)	Dark AI traffic, Perplexity
Gemini (chat)	Yes	Referral	gemini.google.com	AI Assistant, Gemini
Gemini (chat)	No	Direct	(none)	Dark AI traffic, Gemini
Gemini (AI Overviews)	"google.com" only	Organic Search	google / organic	Misattributed: AI Overview, not classic organic
Copilot (Edge sidebar)	Yes	Referral	bing.com or copilot.microsoft.com	AI Assistant, Copilot
Copilot (Bing Chat)	Sometimes	Referral or Direct	bing.com or (none)	AI Assistant, Copilot
Meta AI	Yes	Referral	meta.ai	AI Assistant, Meta
Meta AI	No	Direct	(none)	Dark AI traffic, Meta
Grok	Yes	Referral	grok.com or x.com	AI Assistant, Grok

Two rows cause the most confusion. The AI Overviews row gets misattributed to "Organic Search" rather than Direct, because the click came through google.com with the Referer intact, inflating your classic organic numbers. The Direct rows for the other engines inflate your Direct bucket. Both are misclassification failures into different buckets. The longer write-up is in GA4 missing traffic.

Detection method comparison: referer vs UA vs UTM vs behavioral

You have four signals available for identifying AI traffic, and each one catches a different population. The honest comparison is which signal is sufficient on its own (none) and which combination approaches full coverage (all four, stacked).

Detection method	What it catches	What it misses	False positive rate	Maintenance burden
Referer-only	Engine when Referer survives	Stripped Referer (majority on most engines)	Near-zero	Low; engine domain list
User-Agent-only	Crawler traffic precisely; almost no human traffic	All human visits	Low for bots	Low; UA list updates
UTM-only	Traffic from links you tagged	Organic AI citations	Near-zero	Low; tag discipline
Behavioral-only	Probabilistic unreferred AI traffic	Referred AI (use Referer instead)	10-20% on tuned classifier	High; rules drift
Referer + UA	Visible AI plus clean bot separation	Unreferred AI	Near-zero	Moderate
All four stacked	85-95% of AI traffic, labeled by engine	Inherently untrackable (AI Overviews)	5-15% on inferred slice	High (or buy a tool)

The non-obvious takeaway is that the User-Agent is mostly useless for human attribution but indispensable for bot separation. If you skip the UA filter, your Referer-matched "AI traffic" row will include every GPTBot, ClaudeBot, and PerplexityBot hit on your site, the same crawlers that produced the citation in the first place, and you will overcount human AI traffic by an order of magnitude. The UA is what tells you a hit is a person, not a crawler. Then the Referer tells you which engine the person came from. The two work as a pair.

The other non-obvious takeaway is that the behavioral classifier is the only signal that recovers the unreferred majority, and it is the only signal that requires real engineering ongoing, because the behavioral signature of an AI-referred unreferred visit changes as the engines change their mobile and app behavior. A classifier tuned in Q1 2026 might be 10 points less accurate by Q3 2026 if Meta AI's in-app browser shifts how it renders citations. This is the maintenance argument for buying versus building.

Server-side vs client-side detection

The architecture you choose for AI traffic detection, server-side versus client-side, is the single biggest decision you make on this work, because it sets the ceiling on what you can recover. Client-side detection caps out around 30-50% capture on most sites. Server-side caps out around 75-95%. The gap is mechanical, not implementation-dependent. Here is why.

Capability	Server-side	Client-side (gtag, plausible-tag, etc.)
Reads raw Referer header	Yes	No (only sees document.referrer after policy stripping)
Reads Sec-Fetch-Site	Yes	No (no JS API for request headers)
Reads User-Agent	Yes	Yes, but trivially spoofable client-side
Blocked by ad blockers	No	Yes, partially (~20-40% in some audiences)
Affected by consent declines	No (if no PII stored)	Yes (analytics tag fires post-consent)
Can run a behavioral classifier	Yes, with full session context	Limited; classifier runs after page load
Maintenance: engine domain list	You own it	You own it
Maintenance: classifier rules	You own it	You own it
Performance impact on page	None	Small JS payload
Real-time dashboard updates	Yes	Yes
GDPR posture	Configurable (in-region processing)	Depends on tag vendor

The mechanism behind the capture-rate gap is exactly the eight stripping mechanisms from earlier. Client-side detection sees document.referrer, which is what the page sees after Referrer-Policy has been applied to the incoming navigation. Server-side detection sees the raw Referer header on the inbound request, before any client-side stripping. For an engine like ChatGPT that sends Referrer-Policy: origin, both see the same hostname-only Referer; for an engine that sends Referrer-Policy: strict-origin-when-cross-origin, both see the same origin; but for clicks that go through an in-app webview or a click-tracker rewrite, the server has access to Sec-Fetch-Site and the User-Agent that the client tag does not expose [8]. That access is what lets a server-side classifier separate AI cross-site clicks from internal navigation.

Architecture	Typical AI capture rate	Setup effort	Monthly cost (build)	Monthly cost (buy)
Client-side gtag with custom dimension	25-35%	1-2 hrs	$0	n/a
Client-side privacy analytics (Plausible / Fathom)	25-40%	minutes	$9-29	n/a
Server-side referer-only	35-50%	1-2 days	$0-20 (endpoint)	n/a
Server-side + behavioral classifier (build)	75-90%	5-10 days	$20-100	n/a
Server-side + classifier + Stripe join (buy: Attrifast)	85-95%	under 5 min	n/a	$15

The decision rule I use with founders is: if your AI traffic share is under 5% of total sessions, the client-side cap is probably fine, you are sizing the channel, not optimizing it. If AI is 5-15% of sessions, server-side is worth the engineering. If AI is over 15%, you are operating the channel actively and the capture-rate gap is the difference between accurate weekly reporting and noise; this is the band where a maintained tool or a dedicated server-side build pays for itself fastest. The full architecture argument is in the AI traffic analytics 2026 guide.

A working timeline: when each engine started passing referrers

The reason any of this is a moving problem rather than a one-time configuration is that the engines change their referer behavior on their own schedules. Here is the timeline of major changes I have logged across the seven majors since each engine launched a public consumer chat surface. The dates are publication dates of the change, when I can pin them down; otherwise they are the month I first observed the change in my logs.

The chart maps a real operational pattern: every twelve months, on average, at least one major engine changes its surface or referrer behavior enough that a regex-only detection rule built more than a year ago is partly broken. The teams who notice this are the ones who keep a weekly check that compares server-log AI referrer counts against the analytics tool's "AI" channel, if those numbers drift apart by more than 20%, something changed and the rule needs updating. The teams who don't notice are the ones whose AI channel quietly goes dark in Q3 and they discover it in Q4 board prep.

The point of showing the timeline visually is not to memorize the dates. It is to internalize that this is a moving problem. No single rule shipped in 2024 still catches what 2026 traffic looks like. The maintenance burden on a hand-rolled detection setup is real, ongoing, and almost always underestimated.

Tool comparison: what each platform does with AI traffic

The other axis of the build-versus-buy decision is which existing tool catches which engine, and the truth is that the popular analytics platforms split into clear camps on this. Here is the honest scoring across the four signals, referer match, UA match, UTM honoring, and behavioral inference, for the tools founders ask me about most often.

Tool	Referer match (AI engines)	UA-based crawler separation	UTM honoring	Behavioral inference (unreferred)
GA4 (default config)	None, buckets as Referral	Bot filter on/off only	Standard	None
GA4 (custom channel grouping)	Manual regex per engine	Bot filter on/off only	Standard	None
Plausible	Shows referer hostname; no AI label	Configurable bot filter	Standard	None
Fathom	Shows referer hostname; no AI label	Configurable bot filter	Standard	None
Simple Analytics	Shows referer hostname; no AI label	Configurable bot filter	Standard	None
Pirsch	Some AI labeling on referer match	UA-based bot filter	Standard	None
Matomo	Manual mapping	UA-based bot filter	Standard	None
Attrifast	All 7 engines, maintained list	UA-based, separate crawler dashboard	Standard	Yes, behavioral classifier on unreferred

Two things on this table don't fit in the cell text. First, "shows referer hostname" is not the same as "labels as the engine." Plausible, Fathom, and Simple Analytics will show you chatgpt.com or perplexity.ai as a referrer source, identical to what GA4's Referral channel shows, but none rolls those hostnames into an "AI Assistants" category by default. The deeper Attrifast vs Plausible and Attrifast vs Pirsch comparison pages walk this row-by-row.

Second, the behavioral inference column is the structural separator. Every tool can do a Referer match against an AI domain list with enough configuration; it is just regex. None of the general-purpose dashboards runs a behavioral classifier on unreferred deep-page entries. That is the architectural choice distinguishing a revenue-attribution tool from a privacy-analytics tool from GA4.

Tool category	Solves	Doesn't solve	Best for
GA4 stock	Generic web analytics	AI detection, GDPR fully	Teams already on GA4 with no AI focus
GA4 + custom regex	Referer-passing AI slice	Unreferred majority, GDPR	Teams committed to GA4 staying
Privacy analytics (Plausible/Fathom)	GDPR, clean pageviews	AI labeling, revenue join	Banner-free privacy dashboards
Server logs + grep	Bot visibility, free	Human attribution, scale	Engineers auditing crawl behavior
Revenue attribution (Attrifast)	AI detection + revenue	General funnel exploration	Teams measuring AI revenue

The honest read across these tables: no tool is best on every axis. GA4 is free with Google Ads integration. Plausible and Fathom are beautiful for pageviews and EU-compliant. Server logs are unbeatable for crawler visibility. Attrifast is built around AI detection plus the revenue join. Pick by which question you are answering, not which tool is most "modern."

The architectural punchline

AI traffic source detection in 2026 is a server-side multi-signal problem, not a client-side single-rule one. Teams who get this right run a detector reading the Referer, User-Agent, Sec-Fetch-Site, and any incoming UTM, match against a maintained per-engine domain list, then run a behavioral classifier on what's left. Teams who don't ship a single GA4 custom channel grouping with a seven-hostname regex, declare victory, and miss 50-70% of traffic for the next year.

The build cost is roughly 8-12 hours of initial work plus 1-2 hours of monthly maintenance as engines change. The buy cost is $15/mo for a tool that absorbs the maintenance. Whichever direction you go, do not run client-side-only and act like it is full coverage. The capture-rate gap is the difference between good decisions and noise.

Companion reading: Attrifast AI visibility score, share of voice in AI search, prompt tracking, per-engine references for ChatGPT, Claude, Perplexity, and Gemini, and the diagnostic posts on how much traffic comes from ChatGPT, ChatGPT referral traffic not showing in analytics, and dark AI traffic in GA4.

Identify each engine by Referer when it survives, separate humans from crawlers with the User-Agent, recover the unreferred majority with a behavioral classifier, and stop trusting any single signal in isolation. The seven-engine row hiding inside your Direct bucket becomes seven legible rows you can act on.

Turn (Direct) into seven legible rows

Per-engine AI traffic detection, server-side, joined to Stripe revenue. Seven-day free trial, $15/mo after.

Start free trial →

7-day free trial · $15/mo · cancel anytime

FAQ

What is an AI traffic source, technically?

A human visit whose immediate origin is an AI assistant surface: ChatGPT, Claude, Perplexity, Gemini, AI Overviews, Copilot, Meta AI, Grok, or a smaller engine. You identify it by combining the Referer header (when it survives), the User-Agent string (mostly for crawler separation), a first-party UTM you control, and a behavioral inference for the unreferred deep-page entries. No single signal catches every AI visit.

Which AI engines pass a Referer header in 2026?

All seven majors pass a Referer in some surfaces and strip it in others. Desktop hierarchy: Microsoft Copilot (Edge sidebar) around 65-75%, Perplexity 55-65%, Gemini chat 45-55%, Claude 30-45%, ChatGPT web 18-28%, Meta AI and Grok 10-25%, and AI Overviews near zero because Google rewrites the click through its own tracker. Mobile and desktop-app surfaces strip the referer dramatically more across every engine.

Why is ChatGPT mobile traffic almost always missing a referer?

iOS ChatGPT opens links inside SFSafariViewController, which does not propagate the host app's Referer. Android ChatGPT uses Chrome Custom Tabs, which strips the referer unless the host app explicitly sets it. The desktop ChatGPT app opens system default browser via OS URL handler, so the new tab has no document context to source the Referer from. All three mechanisms eliminate the referer before your server sees it.

Can a User-Agent string identify an AI traffic source?

Only for crawlers and a small slice of in-app webviews, never for bulk human AI traffic. GPTBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, OAI-SearchBot, and Bingbot all identify themselves cleanly. But humans reading AI answers click in a normal browser tab whose UA looks like every other Chrome or Safari session. UA is for bot separation, not human attribution.

Does GA4 detect any AI engines by default in 2026?

No, none. GA4's default grouping has 17 channels and none of them is "AI Assistants." When a referer survives, GA4 buckets the visit into Referral with the engine's hostname as the source but does not label it as AI. When stripped, the visit lands in Direct / (none). A custom channel grouping with regex rules catches the referer-passing slice; the unreferred majority needs server-side enrichment that GA4 cannot do.

What is the difference between PerplexityBot and Perplexity-User?

Two distinct UAs in Perplexity's published bot policy. PerplexityBot is the indexing crawler that builds the index and respects robots.txt. Perplexity-User is the live-fetch agent that runs when a user asks a question and Perplexity needs to retrieve your page in real time. Perplexity-User does not respect robots.txt because it represents a direct user request, same pattern as ChatGPT-User and Claude-User. Treat them as separate dimensions in your logs.

How do I detect Google AI Overviews traffic if the referer is stripped?

You mostly cannot detect it directly. AI Overviews renders inline so most users never click anything, and clicks that do happen go through Google's tracker with a referer of google.com that gives no AI Overview-specific signal. The indirect detection is correlating Search Console impressions on AI Overview-eligible queries with deep-page Direct entries on those same URLs. There is no clean per-click attribution path for AI Overviews in 2026.

What does it mean to detect AI traffic "server-side"?

Your origin server, edge worker, or analytics endpoint reads the Referer header, User-Agent, and Sec-Fetch-Site values on the inbound HTTP request before any client-side JavaScript runs. A client-side tag like gtag.js executes after the page loads and only sees document.referrer, a subset of the real Referer. Server-side, you see the raw header and can layer behavioral signals like deep-page entry on a buying-intent query.

What's the difference between client-side and server-side AI detection?

Client-side reads document.referrer after Referrer-Policy stripping, runs inside ad-blocker range, and cannot see Sec-Fetch-Site. Server-side reads the raw Referer before any stripping, has access to all Fetch Metadata headers, and is invisible to blockers. The capture rate gap is roughly 2-3x. The cost of server-side is owning the engine list and the behavioral classifier; the benefit is the gap.

Can I use UTM parameters to identify AI traffic?

Only for traffic where you control the link: your llms.txt, your sameAs URLs, your own forum posts, and Perplexity Pro citations that auto-append utm_source=perplexity in some configurations. You cannot UTM-tag a link that ChatGPT, Claude, Gemini, or Meta AI generates for you, because the engine renders your canonical URL with no parameters appended. UTM is a floor, not a ceiling.

How do I tell ChatGPT traffic apart from Perplexity from Claude?

Three signals stacked. Referer hostname when it survives: chatgpt.com for ChatGPT, perplexity.ai for Perplexity, claude.ai for Claude. URL path on the referer when present: ChatGPT strips the path, Perplexity preserves the search-slug, Claude preserves the chat UUID. For unreferred cases, behavioral fingerprinting on landing-page pattern: Perplexity skews to comparison pages, ChatGPT to how-to and definitional, Claude to technical and developer-tool content.

Are the AI bots in my server logs the same as AI traffic?

No, completely different audiences. AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bingbot, CCBot) are crawlers that identify themselves in the User-Agent and respect (most of them) robots.txt. AI traffic is human visitors who clicked through after reading an AI answer. They live in different tables and should never be aggregated together. Counting them together overstates AI traffic by 5-10x.

How long does it take to ship per-engine AI traffic detection?

Plan on 4-8 hours for the first cut: a referer-matching function with the seven engine domain lists, a UA filter to strip crawlers, a Sec-Fetch-Site enrichment, and a basic landing-page-pattern heuristic for unreferred entries. Add 2-4 hours to wire output into your warehouse. Then 1-2 hours per month of upkeep as engines ship client updates that shift pass-through rates and new engines launch every 6-8 weeks.

Does Attrifast handle all this automatically?

Yes. The script detects every major AI engine server-side against a maintained domain list, layers in a behavioral classifier for the unreferred majority, separates bot traffic from human traffic, and labels each session by engine. It also joins to Stripe revenue, so you see which engine produced the paying customer. $15/mo, no third-party cookie, no consent banner in most jurisdictions, engine list maintained by us.

Identify every AI engine sending you traffic

Server-side detection across the seven majors, behavioral inference for the unreferred majority, and a Stripe join for revenue. Seven-day free trial.

Start free trial →

7-day free trial · $15/mo · cancel anytime

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

7-day free trial · $15/mo · cancel anytime

How to Track AI Traffic Sources: The 2026 Operator Playbook

Quick facts

What an AI traffic source actually is

Why the referer disappears, mechanically

ChatGPT: the largest channel, the worst pass-through

Claude: the engine almost nobody is logging correctly

Perplexity: the most cooperative engine

Gemini: two engines stitched together

Google AI Overviews: the engine with no clean per-click attribution

Stop guessing what's hiding in (Direct)

Microsoft Copilot: the surprisingly clean engine

Meta AI and Grok: the catch-all bucket

The cross-engine reference matrix

How GA4 buckets each engine

Detection method comparison: referer vs UA vs UTM vs behavioral

Server-side vs client-side detection

A working timeline: when each engine started passing referrers

Tool comparison: what each platform does with AI traffic

The architectural punchline

Turn (Direct) into seven legible rows

FAQ

What is an AI traffic source, technically?

Which AI engines pass a Referer header in 2026?

Why is ChatGPT mobile traffic almost always missing a referer?

Can a User-Agent string identify an AI traffic source?

Does GA4 detect any AI engines by default in 2026?

What is the difference between PerplexityBot and Perplexity-User?

How do I detect Google AI Overviews traffic if the referer is stripped?

What does it mean to detect AI traffic "server-side"?

What's the difference between client-side and server-side AI detection?

Can I use UTM parameters to identify AI traffic?

How do I tell ChatGPT traffic apart from Perplexity from Claude?

Are the AI bots in my server logs the same as AI traffic?

How long does it take to ship per-engine AI traffic detection?

Does Attrifast handle all this automatically?

Identify every AI engine sending you traffic

Related reading

Find revenue hiding in your traffic