Blog / Analytics

How to Track Perplexity, Claude, and Gemini Traffic (2026 Field Guide)

13 min readUpdated May 2026

Vincent RuanFounder, Attrifast · May 24, 2026 · 13 min read

Per-engine playbook for the three AI sources GA4 hides next to ChatGPT: Perplexity, Claude.ai, and Gemini. Real referer rates, bot UA strings, citation URL patterns, and one unified detection snippet.

Part of the AI Search Hub — browse all 35 AI Search guides.

TL;DR

Perplexity, Claude, and Gemini each have their own referer behavior, citation URL pattern, and crawler stack. A single rule (or a copy-pasted ChatGPT detector) misses three different things in three different ways.
Perplexity is the most referer-friendly of the four. Citation clicks usually preserve https://www.perplexity.ai/search/<slug>. Claude is the most referer-hostile after ChatGPT. Gemini sits in the middle on its dedicated surface but drops referer almost entirely when clicks come through Google AI Overviews.
The training crawlers (PerplexityBot, ClaudeBot, Google-Extended) plus the live-fetch agents (Perplexity-User, Claude-User) all hit your domain with documented user-agent strings. Logging them is the leading indicator of citation potential; it is not human traffic.
Citation surfacing differs by engine: Perplexity numbered footnotes, Claude inline brackets with linked source titles, Gemini's right-side "Sources" panel inside AI Mode. The detection code is one matcher but the per-engine UX shapes how often the user actually clicks the link.
I built Attrifast because Perplexity sent me 80 sessions in one weekend and GA4 attributed all 80 to Direct. See the per-engine split inside the dashboard. Start free trial

The ChatGPT detection post argued one thing: a single AI engine breaks GA4 attribution in a specific way, and the fix is a three-layer server-side pipeline (How to Track ChatGPT Traffic). The other three engines each break attribution differently. Perplexity preserves more referers than ChatGPT but cites differently. Claude preserves fewer, has its own crawler, and almost no operator inspects its referer at all. Gemini behaves like two engines stitched together: one polite (gemini.google.com), one almost untrackable (AI Overviews). You need four playbooks, not one.

Referer hit rate by AI engine on desktop: Perplexity 60%, Gemini web 50%, Claude.ai 30%, ChatGPT 10%, Gemini AI Overviews near 0%

Quick Facts

Spec	Value
Perplexity citation referer hit rate (my own logs, desktop)	50-70%
Claude.ai citation referer hit rate (my own logs, desktop)	20-40%
Gemini referer hit rate from gemini.google.com	40-60%
Gemini AI Overviews referer hit rate	Near zero (click goes through Google's click tracker) [10]
Documented Perplexity bots	2 (PerplexityBot, Perplexity-User) [1]
Documented Anthropic bots	2 (ClaudeBot, Claude-User) [2]
Google's AI training opt-out user agent	Google-Extended, introduced September 2023 [3]
AI bots' share of overall web bot traffic	4-6% across 2024, per Cloudflare Radar [4]
Built-in GA4 channels for AI engines	0 [5]
Time to ship the unified detection middleware	60-90 minutes on a Next.js or Rails app

The 80-session weekend in the TLDR was real. A niche post on cookieless attribution picked up a Perplexity citation on a Saturday morning. GA4 showed 4 Referrals (all flagged www.perplexity.ai) and 76 Direct. My server logs showed 78 of the 80 had either a Perplexity referer or were unreferred deep-page entries within minutes of each other, same IP geographies, same landing path. Referer matching caught the first slice. Behavioral fingerprinting caught the second. GA4 caught nothing useful at all.

Perplexity: the most cooperative of the four

Perplexity is the engine where server-side detection feels easiest. That bites you, because you over-index on the easy case and miss the rest.

Official user agents. Per Perplexity's published bot documentation [1], there are two:

PerplexityBot/1.0 (training and indexing crawl). Full UA: Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot). Respects robots.txt. Identifies via published IP ranges.
Perplexity-User/1.0 (live fetch on user request). Full UA: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user. Documented as not respecting robots.txt because it represents a direct user request.

The split is not academic. A PerplexityBot spike on a URL means the crawler discovered it. A Perplexity-User spike on the same URL over the next 24-72 hours means the page is being cited live and a user is fetching it. The lag is the closest thing to a "did GEO work" early-warning signal Perplexity gives you.

Referer behavior. Perplexity passes a Referer header more often than any other major AI engine in my logs. The typical value is https://www.perplexity.ai/search/<slug> on desktop, or just https://www.perplexity.ai/ on some mobile flows. The <slug> is a human-readable encoding of the original query: it tells you what the user was searching for, and multiple clicks with the same slug in a short window are the same user clicking multiple sources from one answer, identical to ChatGPT's conversation-UUID pattern. iOS Perplexity opens links in an in-app browser where referer is often empty; Perplexity for Mac opens in the system browser, which preserves referer better.

Citation URL pattern. Sources render as numbered footnotes inline ([1] [2] [3]) with a sources panel above the answer. Click destinations are your canonical URL with no UTM appended; the referer, when present, is the search-slug form above. Save the full referer including the path.

The Perplexity-specific gotcha. Perplexity's follow-up question UX drives multiple citation clicks across several user-initiated follow-ups, all linking to your same canonical URL. In my logs this looks like one user generating 3-5 hits within a 10-minute window with the same IP and slightly different search-slug paths. Naive session windowing treats that as one visitor; raw hit counts overstate by 3-5x.

Claude.ai: the engine almost nobody is logging

Claude is where I see the largest gap between "is sending real traffic" and "operators have any visibility into it." Most analytics setups have no Claude detection rule at all, so the traffic shows up as Direct/(none) and gets attributed to brand recall.

Official user agents. Per Anthropic's crawler documentation [2]:

ClaudeBot/1.0 (training crawl). UA: Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com). Respects robots.txt.
Claude-User (live fetch when Claude retrieves a URL on a user's behalf during a chat). Same pattern with the claude-user@anthropic.com contact.

Anthropic also documents claude-web as a legacy name and notes that Claude on iOS and Android can fetch URLs via in-app browser context with normal-browser user agents. A Claude mobile user clicking a citation looks like a normal browser request with no referer, with no bot-side hit to correlate against.

Referer behavior. When Claude.ai does pass a referer (around 20-40% on desktop in my data), the value is https://claude.ai/chat/<uuid> (most common), https://claude.ai/ (root, less informative), or occasionally a UI view path. The chat UUID supports the same de-duplication grouping as ChatGPT and Perplexity's identifiers. Anthropic ships Claude links with no explicit referrerpolicy attribute, which means the browser default applies. That default is strict-origin-when-cross-origin per MDN [6], so even when a referer arrives, it is often just https://claude.ai/ with no chat UUID.

Citation URL pattern. Claude surfaces sources as inline bracketed links with the page title shown next to the bracket, with hover preview cards showing snippets. Click destinations are your canonical URL with no UTM modifications. Treat empty-referer deep-page entries adjacent to a ClaudeBot or Claude-User burst as suspected-Claude.

The Claude-specific gotcha. Anthropic's published bot list grew during 2024-2025, and the contact-email convention they use in UA strings is the durable matcher, not just the bot name. Match on the anthropic.com substring as a fallback so you do not miss new bots Anthropic adds.

Gemini: two engines in one trench coat

Gemini is the messiest of the four because it lives across two completely different surfaces that behave differently from each other and require separate detection logic.

Surface 1: gemini.google.com (the dedicated chat interface). The most ChatGPT-like surface. A user opens gemini.google.com, asks a question, clicks an inline citation. Referer behavior is moderate, around 40-60% on desktop in my logs. When present, the value is https://gemini.google.com/ with no path, because conversation state lives in Google session cookies rather than the URL. Unlike ChatGPT, Perplexity, and Claude, there is no per-conversation identifier in the referer to use for de-duplication.

Surface 2: AI Overviews inside the classic Google SERP. This is where the wheels fall off. AIO citation clicks go through Google's standard outbound click tracker (/url?sa=t&url=... or the newer /aclk?... patterns), and the Referer on your server is almost always empty, the same as classic Google organic clicks. The AIO source-panel cards do not change this behavior; whatever click tracker Google applies to organic blue links also applies to AIO citations. They land as Direct/(none) in GA4 essentially 100% of the time. The deeper mechanics of AI Overviews sitting on top of the live web index are covered in Where Does Google AI Get Its Information?.

Official user agents. Per Google's crawler documentation [3], the relevant agents are Googlebot (standard search crawl), Google-Extended (training opt-out, introduced September 2023; add User-agent: Google-Extended + Disallow: / to robots.txt to opt out), GoogleOther, and Google-InspectionTool. Google does not publish a distinct "live fetch on user request" agent for Gemini in the same way OpenAI ships ChatGPT-User or Anthropic ships Claude-User.

Citation URL pattern. Inside the dedicated Gemini chat UI, sources appear in a right-side "Sources" panel listing canonical URLs with publisher favicons; inline citations are subtle superscript numbers. Inside AI Overviews on the SERP, citations render as a horizontal carousel of source cards at the bottom of the AIO block.

The Gemini-specific gotcha. A GA4 custom channel regex like gemini\.google\.com|bard\.google\.com catches some Surface 1 clicks and zero Surface 2 clicks. AIO can be a substantial source of impressions inside Google Search Console's "Search Appearance: AI Overview" dimension (added in late 2025) and a near-zero source of attributable clicks at the referer layer. Treat impressions and clicks as different signals.

Detection-mechanic matrix: 8 signals across 3 engines

Most "which AI engine is sending traffic" articles stop at the referer hostname. The actual detection surface is wider — eight signals you can read independently, and only one engine sets all eight cleanly. The matrix below is what I keep open when I am triaging a customer's "is this Claude or Perplexity?" question. Each cell answers: does this engine emit a recognizable signal on this surface, and what is the value to look for?

Signal	Perplexity	Claude.ai	Gemini (chat)	Gemini (AI Overviews)
1. Referer hostname (when present)	`www.perplexity.ai` [1]	`claude.ai` [2]	`gemini.google.com` [3]	(almost never present) [10]
2. Referer path information	`/search/<slug>` with human-readable query slug	`/chat/<uuid>` (most common) or `/` root	`/` only — conversation state in cookies	n/a (referer stripped at Google's redirect layer)
3. Documented training-crawler UA	`PerplexityBot/1.0` [1]	`ClaudeBot/1.0` [2]	`Googlebot` + `Google-Extended` [3]	same as classic Googlebot stack
4. Documented live-fetch UA	`Perplexity-User/1.0` [1]	`Claude-User` [2]	none published distinctly [3]	none distinctly published
5. Suggested robots.txt directive	`Disallow: /` under `User-agent: PerplexityBot` for training opt-out; `Perplexity-User` ignores robots.txt by design [1]	`Disallow: /` under `User-agent: ClaudeBot` for training opt-out [2]	`Disallow: /` under `User-agent: Google-Extended` for AI training only (still indexes for classic Search) [3]	governed by `Googlebot` directive — blocking it removes you from Search entirely [3]
6. Citation rendering style	Numbered footnotes `[1] [2] [3]` plus pinned sources panel at top	Inline bracketed link with page title + hover preview cards	Subtle superscript numbers + collapsible right-side Sources panel	Horizontal source-card carousel below the AIO block
7. In-app browser behavior on mobile	iOS in-app webview drops referer; Mac app opens system browser (referer preserved)	iOS/Android apps fetch via Custom Tabs / WKWebView, referer often empty	Android Google app preserves referer better than third-party browsers [3]	n/a (AIO lives inside Google Search itself)
8. UTM convention for content-creator citations	URL copied verbatim from your content, so `?utm_source=perplexity-citation` survives	URL copied verbatim, so `?utm_source=claude-citation` survives	URL copied verbatim, so `?utm_source=gemini-citation` survives	URL stripped of query string by Google in some AIO carousel implementations [15]

Three operator reads stack out of that matrix. First, only Perplexity gives you the original query in the referer path — the /search/<slug> slug encodes a human-readable version of what the user asked. That makes Perplexity uniquely useful for content gap analysis: if you see the same slug repeatedly across days, that is a recurring query your content is answering. Second, Claude is the only one of the three whose live-fetch UA carries a verifiable email-domain contact (claude-user@anthropic.com), which gives you a durable matcher even if Anthropic renames the bot. Third, Gemini's two surfaces require two completely different detectors — and even with both running, the AI Overviews surface is effectively invisible at the referer layer.

A subtle implication that bites operators: the "robots.txt opt-out" row above is not symmetric. Blocking PerplexityBot stops training-corpus inclusion but Perplexity-User will still fetch your URLs when a user asks [1]. Blocking Google-Extended stops Gemini training but does nothing to AI Overviews, which is served by Google Search itself and triggered by the same Googlebot crawl you cannot block without disappearing from search entirely [3]. The three engines look similar in shape and have meaningfully different governance surfaces.

Comparison matrix: all four engines, side by side

Engine	Primary referer (when present)	Referer hit rate (desktop, observed)	Training crawler	Live-fetch agent	GA4 default attribution
ChatGPT	`https://chatgpt.com/c/<uuid>` or `https://chatgpt.com/`	Single-digit to teens (Plausible measurement [7])	GPTBot	ChatGPT-User, OAI-SearchBot	Direct/(none)
Perplexity	`https://www.perplexity.ai/search/<slug>`	50-70% (my logs)	PerplexityBot	Perplexity-User	Referral or Direct depending on hit
Claude.ai	`https://claude.ai/chat/<uuid>` or `https://claude.ai/`	20-40% (my logs)	ClaudeBot	Claude-User	Direct/(none) (no rule)
Gemini (gemini.google.com)	`https://gemini.google.com/` (no path)	40-60% (my logs)	Googlebot + Google-Extended	None published separately	Referral or Direct
Gemini (AI Overviews)	None (Google click tracker strips it)	Near zero	Same as above	Same as above	Direct/(none) always

The ChatGPT row is the short version; the deeper walkthrough lives in the ChatGPT detection guide. The Perplexity row is the easy case to over-trust. The Claude row is the slice almost nobody is logging. The two Gemini rows are why "is Gemini sending me traffic?" has two different answers depending on which surface you mean.

A second matrix, focused on citation UX rather than HTTP semantics:

Engine	Citation style in the answer	Source list location	Per-source UTM possible?
ChatGPT	Inline blue links with brand or URL text	Inline + sometimes footer	Only on URLs you control before the model reads them
Perplexity	Numbered footnotes `[1] [2] [3]`	Pinned sources panel at top	Only on URLs you control
Claude.ai	Inline bracketed link with page title	Hover preview cards	Only on URLs you control
Gemini (chat)	Subtle superscript numbers	Right-side Sources panel	Only on URLs you control
Gemini (AIO)	Source carousel at bottom of AIO block	Bottom carousel	Only on URLs you control

The "URLs you control" column is what makes Layer 1 of the three-layer architecture work across all engines. A URL on your own content ending in ?utm_source=ai-citation&utm_medium=organic-ai survives every variant of referer suppression when an AI engine lifts it verbatim into an answer. It works for all four engines and is the single highest-leverage move you can make.

Per-engine RPV: what each row actually pays you

Detection mechanics tell you whether you can see the traffic. The follow-up question every operator asks five minutes later is "which engine is actually worth the implementation effort?" The 200-site Attrifast cohort answers this at the population level — the full data study is in The 2026 AI Search Revenue Benchmark, but the headline numbers for these three engines, blended across the 200 sites:

Engine	Cohort RPV (blended)	Session share of AI total	Conversion rate (B2B SaaS subset)	Detection ease
Perplexity	$1.42	8%	3.1%	High (50-70% referer rate, slug-encoded query)
Claude.ai	$1.18	6%	2.9% (B2B), $1.94 RPV on B2B specifically	Medium (20-40% referer rate, anthropic.com fallback matcher)
Gemini (chat surface)	$0.41	12% (incl. AI Overviews)	1.6%	Low-medium for chat (40-60% referer); near-zero for AIO
ChatGPT (for reference)	$0.87	71%	2.7%	Low referer rate (10-15%), high volume

The ranking inverts depending on whether you index on volume or per-visitor value. ChatGPT delivers 12x as many sessions as Perplexity but Perplexity converts those sessions at 3.2x higher RPV — so total dollar contribution still favors ChatGPT for most SMB sites, but Perplexity wins the marginal-effort calculation. If you have a 30-minute budget for ChatGPT detection and another 30 minutes for one more engine, Perplexity is the right second slot.

Worked example: a B2B SaaS where Claude over-indexes

The "Claude RPV is highest on B2B SaaS" finding from the cohort benchmark is the kind of stat that sounds like noise until you see a single site where it dominates the channel mix. The composite below pulls from three developer-tool B2B SaaS sites in the cohort, all in the $15k-$30k MRR range, all selling to engineering audiences (one CI/CD tool, one observability dashboard, one error-tracking competitor). All three saw Claude punch significantly above its cohort weight.

Channel (30-day window)	Sessions	Trial starts	Paid conversions	MRR contribution (composite)
Google organic	18,210	198	22	$2,090
ChatGPT (referer + suspected-AI)	4,180	64	11	$1,045
Perplexity	540	16	4	$380
Claude.ai (referer + ClaudeBot-adjacent suspected)	720	28	9	$855
Gemini (chat)	195	3	0	$0
Other (direct, social, paid, referral)	6,420	47	5	$475
Total	30,265	356	51	$4,845

Two things to notice. First, Claude delivered 720 sessions versus Perplexity's 540 — Claude was the larger AI source after ChatGPT on these developer-audience sites, contrary to the cohort-wide pattern where Perplexity edges out Claude by session share. Second, Claude converted 9 of 720 sessions to paid (1.25%) — the highest conversion rate of any channel in the table, including Google organic (0.12%) and ChatGPT (0.26%). The $855 in MRR Claude contributed represents 17.6% of new MRR from 2.4% of sessions.

The hypothesis from talking to these operators afterward, which is consistent with the broader cohort finding: Claude over-indexes for developer-tool B2B because developers use Claude as a research surface for technical product comparisons ("what's the best alternative to Datadog for a 10-person team," "how does Sentry compare to Bugsnag for error grouping"), and Claude's citation UI puts the linked page title front-and-center in the answer, which moves clicks toward the top-cited source. The same dynamic does not show up on consumer-ecommerce sites, where Claude usage is lower and the queries are less product-comparison-heavy. If you sell developer tools and you are not logging Claude traffic, you are missing your highest-converting AI channel by a wide margin.

Worked example: a content-heavy SaaS where Perplexity over-indexes

The mirror case. Perplexity over-indexes on sites whose content matches the "fact-dense answer with multiple sources" query pattern Perplexity is built around. Composite from three content-led B2B SaaS sites in the cohort (one HR-tech, one accounting, one logistics), $10k-$25k MRR band:

Channel (30-day window)	Sessions	Paid conversions	Per-engine RPV (this composite)
Google organic	24,150	31	$0.10
ChatGPT	3,940	11	$0.24
Perplexity	1,180	17	$1.36
Claude.ai	290	3	$0.98
Gemini	410	1	$0.23

Perplexity converted 17 of 1,180 (1.44%) at $94.50 average first-month subscription value, producing an RPV of $1.36 — closely in line with the cohort-wide Perplexity RPV of $1.42 in the broader study. The volume gap versus ChatGPT (3,940 vs 1,180) is roughly the cohort-wide 12% session-share ratio. So for this composite site, Perplexity is the highest-RPV channel by a factor of 1.4x over the next-best (Claude), and the channel-effort calculus is overwhelmingly to invest in Perplexity citation surface even though raw volume is small.

The lesson is not "Perplexity is better than ChatGPT" — it is that the right channel mix depends on your audience composition, and you cannot read your own mix until you have all four detectors running. Both worked examples were operators who, before instrumentation, would have told you "we don't really get AI traffic." Both, after instrumentation, found the largest channel they had been missing.

Gemini surface disambiguation: chat vs AI Mode vs AI Overviews

Gemini is the term Google uses for three different products that show up in different places, with three different attribution profiles. Operators who say "we got X visits from Gemini last month" almost always mean one of these without knowing which. The disambiguation matters because the detection mechanic is different for each, and only the first is straightforward.

Surface	What it is	Where it lives	Referer when click happens	GA4 default attribution	Detection difficulty
Gemini (chat) — `gemini.google.com`	The standalone Gemini chatbot, ChatGPT-equivalent	`gemini.google.com` standalone site	`https://gemini.google.com/` (no path) [3]	Referral (`gemini.google.com / referral`) sometimes; `google / organic` sometimes [5]	Medium — 40-60% referer rate, value uninformative beyond hostname
Gemini in the Google app (Android/iOS)	Same model, surfaced inside the Google search app	Mobile, inside the Google app	`https://www.google.com/` typically; on Android sometimes `android-app://com.google.android.googlequicksearchbox`	Referral if `www.google.com`, app-referer if the Android-app schema [3]	Hard — looks like organic Google search
AI Mode	The "AI Mode" tab inside Google Search SERP, a 2024-2025 feature [16]	inside google.com SERP, behind the "AI Mode" tab	Google's URL redirector strips referer in most flows [10]	Direct/(none)	Very hard — covered in Google AI Mode vs AI Overviews and Gemini AI Mode Tracking Guide
AI Overviews	The summary block at the top of classic SERPs	inside google.com SERP results page	Google redirects through `/url?` or `/aclk?`, stripping referer [10]	Direct/(none)	Very hard — referer-layer invisible
Vertex AI search (enterprise embeddings of Gemini)	Google Cloud product, customer-deployed	inside customer apps	Whatever the customer app sets	Whatever the customer app sets	Depends on customer integration

Operators who report a gemini.google.com referer in their logs are seeing the first row of that table. Everyone else is either silent at the referer layer (AI Mode, AI Overviews) or showing up as Google organic (Google app on mobile). The session-share number for "Gemini" in the cohort benchmark (12%) is overwhelmingly the AI Overviews + AI Mode combined slice, leaking through behavioral fingerprinting rather than referer matching. The chat surface accounts for 2-3 percentage points of that.

Two practical consequences. First, comparing your Gemini bucket to your ChatGPT bucket is comparing two different things by default. ChatGPT's "Direct" is invisible because clients strip the referer; Gemini's "Direct" is invisible because Google's click tracker rewrites the URL. Operationally similar, mechanistically different, and only one of them (Gemini) has Search Console as a parallel impression-side data source [9]. Second, the AIO impression count from Search Console is the closest thing to a leading indicator for Gemini-surface traffic on a content site — if your AIO impressions are climbing month-over-month, your AIO-attributable click-through will follow eventually, even though the referer-layer measurement of those clicks is near zero.

Unified detection code that handles all four engines

The production matcher I run on attrifast.com, a strict superset of the ChatGPT-only snippet in the sibling post. Drop it into a Next.js middleware, an Express layer, or a Rails before_action.

// app/middleware.ts (Next.js edge or node middleware)

const AI_REFERER_DOMAINS = new Set([
  // ChatGPT
  'chatgpt.com', 'chat.openai.com',
  // Perplexity
  'perplexity.ai', 'www.perplexity.ai',
  // Claude
  'claude.ai',
  // Gemini (dedicated surface only; AIO clicks lose referer)
  'gemini.google.com',
  // Adjacent AI engines worth bucketing under "AI"
  'copilot.microsoft.com', 'you.com', 'phind.com', 'poe.com',
])

const AI_BOT_MATCHERS: Array<{ test: RegExp; name: string; kind: 'training' | 'live-fetch' }> = [
  // OpenAI
  { test: /GPTBot\/[\d.]+/, name: 'gptbot', kind: 'training' },
  { test: /ChatGPT-User\/[\d.]+/, name: 'chatgpt-user', kind: 'live-fetch' },
  { test: /OAI-SearchBot/, name: 'oai-searchbot', kind: 'training' },
  // Perplexity
  { test: /PerplexityBot\/[\d.]+/, name: 'perplexitybot', kind: 'training' },
  { test: /Perplexity-User\/[\d.]+/, name: 'perplexity-user', kind: 'live-fetch' },
  // Anthropic — match on contact-email substring as a durable fallback
  { test: /ClaudeBot\/[\d.]+/, name: 'claudebot', kind: 'training' },
  { test: /claude-user@anthropic\.com/i, name: 'claude-user', kind: 'live-fetch' },
  { test: /anthropic\.com/i, name: 'anthropic-other', kind: 'training' },
  // Google — Google-Extended is the training opt-out signal
  { test: /Google-Extended/, name: 'google-extended', kind: 'training' },
]

type DetectedSource =
  | { source: string; bucket: 'utm' }
  | { source: string; bucket: 'bot'; kind: 'training' | 'live-fetch' }
  | { source: string; bucket: 'human-cited'; refererPath: string }
  | { source: 'unknown-ai'; bucket: 'suspected-ai' }
  | { source: 'direct'; bucket: 'direct' }

function detectAiSource(request: Request): DetectedSource {
  const url = new URL(request.url)

  // Layer 1: UTM tags on URLs you control survive any referer suppression.
  const utmSource = url.searchParams.get('utm_source') ?? ''
  if (
    utmSource.startsWith('chatgpt') ||
    utmSource.startsWith('perplexity') ||
    utmSource.startsWith('claude') ||
    utmSource.startsWith('gemini') ||
    utmSource.startsWith('ai-')
  ) {
    return { source: utmSource, bucket: 'utm' }
  }

  // Layer 0: bot detection (not human traffic but a signal worth logging).
  const ua = request.headers.get('user-agent') ?? ''
  for (const bot of AI_BOT_MATCHERS) {
    if (bot.test.test(ua)) {
      return { source: bot.name, bucket: 'bot', kind: bot.kind }
    }
  }

  // Layer 2: human click with a usable referer.
  const referer = request.headers.get('referer') ?? ''
  if (referer) {
    try {
      const parsed = new URL(referer)
      if (AI_REFERER_DOMAINS.has(parsed.hostname)) {
        return {
          source: parsed.hostname,
          bucket: 'human-cited',
          refererPath: parsed.pathname,
        }
      }
    } catch {
      // malformed referer, fall through
    }
  }

  // Layer 3: behavioral fingerprint for unreferred deep-page entries.
  const depth = url.pathname.split('/').filter(Boolean).length
  if (!referer && depth >= 2) {
    return { source: 'unknown-ai', bucket: 'suspected-ai' }
  }
  return { source: 'direct', bucket: 'direct' }
}

Three things worth flagging.

First, Perplexity's documentation [1] notes that Perplexity-User does not respect robots.txt because it represents a direct user request; treat it as a live-fetch signal, not crawler exclusion data.

Second, the Anthropic matcher includes a fallback on the anthropic.com substring. If Anthropic ships a new bot with a different name but the same email-domain convention, the fallback catches it.

Third, the Gemini detection covers gemini.google.com only. There is no clean server-side detector for AI Overviews citation clicks because Google's outbound click tracker strips the referer; that traffic falls into suspected-ai if it lands on a deep page from an empty-referer first visit. Search Console's "Search Appearance: AI Overview" dimension is the only sanctioned visibility surface for AIO citations on the impression side [9].

Persist the returned bucket on the first-party session row alongside a session ID. When a Stripe checkout.session.completed webhook fires later, the join from session ID to payment carries the AI-engine label through. That is the end-to-end shape of the first-party UTM-to-revenue tracking pipeline inside Attrifast.

Per-engine citation patterns: how the user actually sees your link

The detection code above tells you whether a click came from an AI engine. The citation pattern below tells you why the user clicked.

ChatGPT. Inline blue link with your URL text or page title in the answer prose. ChatGPT's UX rewards descriptive titles; a citation reading "Stripe Documentation" gets more clicks than the same citation showing just the URL.
Perplexity. Numbered footnotes ([1] [2] [3]) inline plus a pinned sources panel at the top. Perplexity is the most source-forward of the four; users see the list before they read the answer. Click-through on the top 2-3 sources runs materially higher than positions 4-7 in my observation.
Claude.ai. Inline bracketed link with the page title in the bracket and hover preview cards. The title text itself is the click target, so a clear page title acts as ad copy.
Gemini chat. Subtle superscript numbers with a collapsible right-side Sources panel. Users frequently leave it collapsed; click-through is lower than Perplexity in the cases I have measured.
Gemini AI Overviews. Horizontal carousel of source cards at the bottom of the AIO block. The click context is closer to classic organic search than to a chat UI.

Optimizing for citation-click rate is partly content (clear title, fact-dense snippet) and partly engine-aware. The full GEO playbook lives in How to Get Cited by AI Engines.

Debugging when one engine looks underrepresented

After the unified detection ships, the most common ticket pattern is "Perplexity bucket looks right, but the Claude bucket is suspiciously empty" — or some variant where one engine appears materially smaller than expected. The diagnostic walk below covers the cases I have seen actually break in production. Run through them in order before concluding the engine is genuinely not sending traffic.

1. Is your domain match set missing a hostname variant?

The Perplexity hostname can arrive as perplexity.ai, www.perplexity.ai, or occasionally perplexity.ai.<region> in non-US deployments. Claude.ai is usually just claude.ai but Anthropic experimented with console.anthropic.com for embedded-context links during 2024. Gemini chat is gemini.google.com but old bard.google.com URLs still resolve and redirect, so the original referer may still show bard.google.com for some shared-conversation links. Add all variants to your hostname set; missing a single variant can drop 5-30% of an engine's referer-matched bucket.

2. Is your URL parsing case-sensitive?

The URL standard normalizes hostnames to lowercase [6], but a raw-string match against 'Claude.ai' (with capital C, as some Anthropic email signatures send) will miss it. Always parse with new URL(referer) and compare against the lowercased .hostname field. The bug is the same one I covered in the ChatGPT article, and it bites in proportion to how often the engine's clients capitalize their referer URLs.

3. Are you over-attributing Claude traffic to "direct" via the empty-referer path?

Claude has the lowest referer hit rate of the three (20-40% on desktop, dramatically lower in mobile apps [2]). If your behavioral heuristic kicks in only on referer-empty deep-page entries, most of the Claude mobile traffic falls into your suspected-ai bucket without any way to credit it specifically to Claude. The fix: when a ClaudeBot or Claude-User hit on a URL is followed by a referer-empty human visit on the same URL within 30 minutes (and from a distinct-but-plausible IP), tag the human visit as suspected-claude rather than generic suspected-ai. The bot-then-human pairing is the closest thing to a fingerprint Claude gives you.

4. Are you running on a CDN that filters the Referer header?

Some CDN configurations strip the Referer header before it reaches your origin, either as a privacy default (rare) or as part of an over-aggressive request normalization rule (common). If you see suspiciously few referer values arriving across all engines, not just one, the CDN is probably the culprit. Check your Cloudflare, Fastly, or CloudFront request-header rules for any transform that touches referer. Cloudflare's documentation on transform rules [15] explicitly warns about this case.

5. Did Perplexity's path change recently?

Perplexity's referer path was /search/<slug> for most of 2024-2025, with /discover/<slug> and /space/<slug> introduced as additional surfaces during 2025 [1]. If your matcher tests only /search/ paths explicitly, the newer surfaces fall through. The fix is to match on the hostname only, not the path — log the path for analysis, but do not gate attribution on it.

6. Is Google's redirect layer rewriting Gemini chat referers?

In some Gemini chat flows the click is layered through a google.com/url?q= redirect first, which strips the referer to https://www.google.com/ or removes it entirely. The result: a Gemini click ends up in your logs as Google organic referral, not as gemini.google.com. Diagnostic: if you have a sudden spike in www.google.com referrals on pages that are typical AI-citation content (long-tail methodology articles, comparison guides), audit the user-agent and visit-depth patterns. Real Google organic users behave differently from Gemini-redirected users. The pattern is described in Search Engine Land's Google attribution analysis [14].

7. Are you correlating bot hits with downstream human visits?

The strongest leading indicator for an engine sending you traffic is a burst of that engine's live-fetch bot UA on a single URL, followed within hours by human visits to the same URL. If your Claude bucket looks empty but Claude-User/1.0 hits are arriving steadily on your high-conversion content URLs, the human visits are landing — they just are not getting referer-matched because Claude's referer pass-through rate is low. Cross-tabulate bot hits per URL against human visits per URL with a 6-24h lag, and the "missing" Claude bucket usually shows up as suspected-AI traffic on the same URLs.

8. Are you under-sampling mobile?

Mobile drops referer at much higher rates than desktop across every AI engine [5]. If your audience is heavily mobile (consumer SaaS, ecommerce, media) the under-attribution skew is mechanically larger than for desktop-heavy audiences (B2B SaaS dev tools, enterprise software). Read your AI bucket as a desktop-biased measurement and apply a 1.3-1.5x correction factor when comparing to total traffic.

9. Is the engine actually growing or are you measuring a stable level?

Per the cohort benchmark, Perplexity grew at 21.6% monthly, Claude at 18.3%, Gemini at 7.4% between December 2025 and May 2026. If your engine bucket looks "low" but flat month-over-month, the issue may not be your detector — it may be that the underlying engine is genuinely growing at a slower rate than the cohort average for your specific vertical. Compare your monthly delta to the cohort delta before concluding the detector is broken.

10. Has Anthropic, Perplexity, or Google shipped a new bot UA you have not added?

All three vendors add bot UAs over time. Anthropic published an update to their crawler docs during 2024-2025 [12] that introduced contact-email convention variants. Perplexity has versioned its bots through 1.0 / 1.x ranges [1]. Google ships new UAs for Vertex AI and other downstream surfaces [3]. The single most durable matcher is the contact-email substring (anthropic.com, perplexity.ai, google.com) as a fallback after the named-bot regex. The detection code above includes the Anthropic fallback; add the others if your customer-bot match rates start drifting downward.

The ten checks above cover the diagnostic surface I run through before declaring an engine genuinely not sending traffic to a customer site. The most common ground truth, across roughly 30 customer debug sessions I have done on this topic in 2025-2026, was check 5 or check 8 — a missed surface variant, or an over-mobile audience the detector was under-counting.

Limitations

A few caveats.

Mobile referer behavior is a moving target across all four engines. iOS in-app webviews and Android Chrome custom tabs behave differently from desktop and the difference varies with app version. The hit-rate ranges above are desktop figures.
Voice queries through any AI engine produce no click if the model speaks the answer back. The brand mention happens; the traffic does not.
Enterprise tenants of Claude and Gemini (Claude for Work, Gemini for Workspace) use customer-isolated session contexts and may differ from the consumer surfaces I measure.
AI Overviews on non-English locales roll out on different schedules. The detection logic is the same; the appearance rates are not.
Adversarial spoofing. A scraper can set User-Agent to Perplexity-User/1.0 or ClaudeBot/1.0 to look legitimate. Verify against published IP ranges (Perplexity [1], Anthropic [2], Google [3]) if blocking decisions hinge on it.
The suspected-ai bucket is a labeled-unknown, not a confident attribution. Use it for trend analysis and for sizing the gap GA4 cannot see, not for per-visit attribution claims.

FAQ

Does Perplexity send a Referer header when a user clicks a citation?

Yes, more often than ChatGPT does. Perplexity citation clicks generally arrive with a Referer in the form https://www.perplexity.ai/search/<slug> on desktop browsers. The hit rate in my own logs sits in the 50-70% range, materially higher than ChatGPT's single-digit-to-teens rate, but still nowhere near 100%. Mobile Perplexity (iOS and Android apps) drops the referer more often than desktop, same pattern as every other AI app. The "pro search" interface and the "spaces" view both surface as the same /search/ path.

What is the Claude.ai referer string?

When present, Claude.ai outbound clicks arrive with a Referer of https://claude.ai/chat/<uuid> or just https://claude.ai/. The hit rate is the lowest of the four major AI engines in my measurement, around 20-40% on desktop browsers and dramatically lower in the mobile apps. Anthropic ships Claude links with no special referrer-policy attribute most of the time, so when the click lands you do get the chat UUID, but a large share of clicks lose the referer in transit through in-app webviews and the Claude desktop app.

How does Gemini handle outbound citation clicks?

Gemini citations from the gemini.google.com surface usually pass a Referer of https://gemini.google.com/ with no path information, because the conversation state lives in Google's session storage rather than the URL. Referer rate is in the 40-60% range on desktop, lower on mobile. Separately, AI Overviews citations inside the Google SERP almost never preserve a referer of any kind because Google rewrites the outbound link through its own click tracker. AI Overviews clicks land as Direct/(none) in GA4 essentially always.

What is PerplexityBot and how is it different from Perplexity-User?

Perplexity runs two distinct user-agents per their published bot documentation. PerplexityBot is the indexing crawler that builds Perplexity's web index, identified as Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot). Perplexity-User is the live-fetch agent that runs when a user asks a question and Perplexity needs to retrieve a specific URL in real time. PerplexityBot respects robots.txt; Perplexity-User is documented to not respect robots.txt because it represents a direct user request, the same model OpenAI uses for ChatGPT-User. Treat them differently in your logs.

Does Claude have a documented crawler?

Yes. Anthropic publishes ClaudeBot for training-data crawling and a separate Claude-User agent for live fetches during a chat. ClaudeBot identifies as Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com) and respects robots.txt. Claude-User is documented in Anthropic's crawler support article and fires when a Claude session needs to fetch a URL on a user's behalf. Both are bots, not human traffic, but ClaudeBot hits on a fresh page are the leading indicator that Anthropic considers your domain crawlable for the next training cycle.

Can GA4 detect Perplexity, Claude, or Gemini referrals automatically?

No, none of them. GA4's default channel groupings have no rules for perplexity.ai, claude.ai, or gemini.google.com. When a referer does arrive, it lands in the Referral channel as a generic entry with no AI labeling. You can build a custom channel group with a regex like perplexity\.ai|claude\.ai|gemini\.google\.com but it only catches the minority of clicks that preserve a referer in the first place. For the rest you need server-side detection and behavioral fingerprinting, same architecture as the ChatGPT case.

Should I look at the ChatGPT detection guide first?

Yes, the ChatGPT post covers the underlying mechanics (why Referer arrives unevenly, how GA4 buckets it as Direct, the three-layer detection architecture) in more depth. This guide assumes you have read that one and walks the per-engine specifics for the other three. The detection code at the bottom of this article is a strict superset of the ChatGPT snippet.

References

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

7-day free trial · $15/mo · cancel anytime