An original 1,200-prompt benchmark study of AI search citations across 12 verticals and 4 engines: median citation density, top cited domains, source-type breakdown, per-engine variance, and YoY change vs 2025.
I have spent two years building Attrifast on the assumption that the AI-citation question and the revenue question are the same problem viewed from two ends. The revenue end I can measure precisely, because every Stripe payment in our cohort joins back to a first-party session with an AI engine source preserved. The citation end I have always had to read out of other people's reports — Profound's index, SEOcrawl's prompt tracking, the Princeton GEO research, Backlinko's AI Overviews studies. None of those reports cuts the data the way I actually want to consume it: per vertical, per engine, with explicit methodology, on a prompt corpus large enough to be benchmarkable.
So in April and May 2026 we ran the study ourselves. This document is the full report. It is structured as a research paper rather than a blog post: methodology first, findings second, per-vertical detail third, per-engine analysis fourth, implications fifth. The numbers are intended to function as an industry benchmark — something you can compare your own AI citation share against, the way SaaS teams compare their conversion rates against Baremetrics' SaaS benchmarks or their churn against ChartMogul's SaaS Growth Benchmarks.
A note on scope before the numbers start. This study measures citation presence, not click-through or revenue. A domain being cited 50 times across our 1,200 prompts tells you nothing on its own about whether those citations sent traffic, let alone whether the traffic paid. For the traffic-and-revenue end of the same problem we published the 2026 AI Traffic Revenue Benchmark on 200 Stripe-connected sites earlier this month. The two studies are designed to be read together: this one tells you who is being cited and where, the revenue benchmark tells you what those citations are worth once a human actually clicks. The relationship between presence and revenue is exactly the AI citations vs backlinks distinction — correlated but not identical scoreboards.
Abstract
Across 1,200 buyer-intent prompts spanning 12 verticals (SaaS, DTC apparel, fintech, insurance, legal, healthcare, education, B2B services, real estate, travel, food and beverage, consumer electronics), executed three times each on ChatGPT Search, Claude with web search, Gemini, and Perplexity between April 12 and May 14, 2026, we logged 51,723 citation events on 8,917 unique domains. Median citation density per answer was 6.4 unique domains on Perplexity, 3.6 on Claude, 3.1 on ChatGPT, and 2.4 on Gemini. SaaS verticals showed the highest median citation density at 7.4 unique domains per Perplexity answer; healthcare showed the lowest at 2.1. Reddit captured 11.4% of all citation slots aggregated across engines; Wikipedia captured 8.9%; vendor sites captured 14.7%; editorial reviews captured 22.3%. Year-over-year (vs a May 2025 pilot of 300 prompts), citation density rose roughly 28%, with Perplexity and Claude growing fastest. Run-to-run citation overlap on the same prompt averaged 49-67% by engine, meaning roughly one in three to one in two citations changes between executions of the same prompt — confirming that single-shot citation checks are directionally useful but not precision-grade.
Quick facts
Metric
Value
Source
Total prompts in study
1,200 (100 per vertical × 12 verticals)
This study
Total verticals covered
12
This study
Engines tested
4 (ChatGPT Search, Claude, Gemini, Perplexity)
This study
Runs per prompt per engine
3
This study
Total prompt-engine-runs
14,400
This study
Total citation events logged
~51,723
This study
Unique domains cited
~8,917
This study
Measurement window
April 12 - May 14, 2026
This study
Perplexity median citations per answer
6.4
This study
Claude median citations per answer
3.6
This study
ChatGPT median citations per answer
3.1
This study
Gemini median citations per answer
2.4
This study
Reddit share of all citation slots
11.4%
This study
Wikipedia share of all citation slots
8.9%
This study
Vendor-site share of all citation slots
14.7%
This study
Editorial-review share of all citation slots
22.3%
This study
YoY citation-density growth (May 2025 → May 2026)
~28%
This study + 2025 pilot
Princeton GEO visibility lift from citations + stats
Up to 40%
Princeton GEO paper [1]
ChatGPT weekly active users (Q1 2026)
~800 million
OpenAI / Reuters [4]
AI Overviews trigger rate (US English, Q1 2026)
13-15% of queries
BrightEdge / Search Engine Land [5]
I want two numbers to stick before we get into the body. The first is the Perplexity-to-ChatGPT citation ratio of 2.1x — a single prompt routed to Perplexity exposes a brand to roughly twice the citation slots it would see on ChatGPT. The second is the 22.3% editorial-review share — almost a quarter of all AI citation real estate flows to a relatively small set of editorial properties (G2, Wirecutter, NerdWallet, Healthline, Investopedia, Forbes, and a few dozen niche review sites). Those two facts shape almost every implication in the second half of this report.
Why we ran this study
The honest reason: I got tired of citing other people's data when I had the technical capacity to generate my own. Most operator conversations I have had in 2026 about AI citation strategy followed the same shape — someone references the Princeton GEO paper, someone else cites Profound's index, someone quotes a number from an Ahrefs blog post, and nobody can reconcile any of it because the underlying methodologies are different. The Princeton work is academic and the cohort is small. Profound's index is excellent but the public-facing slices are limited. Ahrefs' GEO research is correlational on a narrow prompt set. SEOcrawl publishes prompt-tracking data but does not break it out per vertical at scale.
What was missing was a single research-grade benchmark that operators could use the way they use Backlinko's annual content marketing studies or Ahrefs' SEO research reports: a published, replicable, per-vertical, per-engine cut with the methodology open enough that another team could run the same prompts and check the numbers. That is what this study is. We are publishing it under the same evidence-layer framing we have used in our previous benchmarks: this is Layer 1 evidence in the evidence stack we laid out for GEO measurement — presence in AI answers, before any click, session, or conversion has occurred.
The second reason is competitive. The Attrifast product surfaces AI-citation tracking for paying customers via our AI citation tracking feature, but our differentiator is the revenue join, not the citation count itself. Publishing the citation benchmark publicly removes a friction point in the conversation with prospects — we can point to numbers from our own corpus instead of stitching together fragments from other vendors' reports. That said, we ran this benchmark cleanly: the prompt set was constructed before any commercial consideration, the engine harness logs every citation regardless of which brand is cited, and Attrifast does not appear in the top-20 cited domains in any vertical because the prompt set was deliberately built to be brand-agnostic. The benchmark is a benchmark, not a placement.
Methodology
This is the section that determines whether every other number in this report is worth reading. I have tried to be specific enough that another team could replicate the study on its own prompt corpus and engine harness.
Prompt corpus construction
We built the 1,200-prompt corpus across 12 verticals — 100 prompts per vertical — between February and April 2026. Source mix:
Prompt source
Share of corpus
Selection method
Google Search Console (attrifast.com + 6 client SaaS sites)
28%
Commercial-intent queries with ≥50 impressions
Public Reddit threads (vertical-specific subreddits)
24%
Top-upvoted "looking for" / "recommend" patterns
AnswerThePublic exports per vertical
19%
Filtered to question + comparison intent
Manual prompt construction (us)
16%
Filled gaps in vertical coverage
AhrefsKeywords Explorer (commercial intent)
13%
Top buyer-intent queries by volume
Every prompt was tagged with a vertical, an intent type (comparison, recommendation, definition, troubleshooting, pricing, alternatives), and a brand-presence flag (whether the user named a specific brand). We deliberately filtered to buyer-intent prompts — questions someone would ask while researching a purchase — because navigational and informational queries (which dominate raw query corpora) have a different citation profile and would dilute the cross-vertical comparability.
Distribution of intent types across the 1,200 prompts:
Intent type
Share of corpus
Example pattern
Recommendation ("best X for Y")
34%
"Best CRM for small SaaS teams"
Comparison ("X vs Y")
22%
"Stripe vs Paddle for European SaaS"
Alternatives ("alternatives to X")
14%
"Alternatives to Salesforce for under 50 employees"
Pricing / cost ("how much does X cost")
11%
"How much does GitLab Ultimate cost per seat"
Capability ("can X do Y")
10%
"Can Notion handle revenue dashboards"
Definition / category ("what is X")
9%
"What is product-led growth"
Vertical taxonomy
We used a 12-vertical schema chosen to balance coverage breadth against per-vertical depth. The taxonomy is intentionally coarser than NAICS but finer than the typical "B2B vs B2C" split most public AI search reports use. Site counts per vertical refer to the source-domain pool we tracked, not customer counts.
Vertical
Prompts
Source-domain pool tracked
Example query
SaaS
100
412 domains
"Best project management software for remote teams"
DTC apparel
100
287 domains
"Best sustainable workout clothes brands"
Fintech
100
359 domains
"Best business checking accounts for freelancers"
Insurance
100
198 domains
"Best small business liability insurance"
Legal
100
174 domains
"How to incorporate a Delaware C-corp"
Healthcare
100
213 domains
"Best telehealth services for ADHD treatment"
Education
100
246 domains
"Best online masters in computer science"
B2B services
100
318 domains
"Top digital marketing agencies for SaaS"
Real estate
100
167 domains
"Best real estate CRMs for solo agents"
Travel
100
271 domains
"Best travel insurance for digital nomads"
Food and beverage
100
234 domains
"Best meal kit delivery services for families"
Consumer electronics
100
389 domains
"Best 27-inch monitors for software engineers"
Engine harness
Each of the 1,200 prompts was executed three times per engine — once per week across a three-week window — to account for non-determinism. The three runs were spaced at least 48 hours apart and were issued from a fresh, logged-out session for each engine.
Engine
Surface tested
Version notes (snapshot window)
ChatGPT Search
chatgpt.com with browsing/search enabled
GPT-4o + GPT-5 mix per OpenAI rollout in April 2026
Claude (web search)
claude.ai with web search enabled
Claude Opus 4.5 / Sonnet 4.6 mix per Anthropic [16]
Gemini
gemini.google.com
Gemini 2.5 Pro on most queries
Perplexity
perplexity.ai default model
Sonar Large / GPT-class router mix
The full harness code is internal, but the relevant rules are: (1) we counted any explicit URL citation or numbered footnote as a citation event; (2) we deduplicated to unique domains per answer, so a single answer citing three pages on wikipedia.org counted as one Wikipedia citation; (3) we logged the citation order (position 1, 2, 3, ...) but reported only presence-and-count metrics in this study; (4) we excluded internal-engine citations (e.g., Perplexity citing its own page) and excluded any citation that did not resolve to a publicly accessible URL.
Source-type taxonomy
To analyze citations beyond raw domain counts, we tagged every unique cited domain with a source-type label. The taxonomy was finalized after a manual classification of the top 500 most-cited domains and applied programmatically (with manual review) to the long tail.
A subset of the prompt corpus (n=180, 15 per vertical) was double-run on a second harness operated by an independent contributor in a different geographic region (US East vs. EU West) to estimate inter-harness reproducibility. Mean Jaccard overlap across the double-run set was 0.71 (i.e., 71% of citations agreed between harnesses), which is in the range of what we observed for within-harness three-run consistency. We do not claim cohort precision better than ±15% on any single per-vertical metric.
What this study is not
Not a query-volume-weighted benchmark. All 100 prompts per vertical contribute equal weight regardless of underlying real-world query volume. A volume-weighted study would over-index on a small number of head queries.
Not a click-through study. We measure citation presence, not whether anyone clicked the citation. Citation-to-click conversion is covered in our revenue benchmark companion study.
Not enterprise-procurement prompts. Buyer-intent prompts skew toward SMB and prosumer phrasing; Fortune 500 RFP language is not represented.
Not a one-shot snapshot. Each prompt was run three times, but the engines update continuously. Treat the May 2026 numbers as a point-in-time slice; we plan to re-run quarterly.
Not a single-engine deep-dive. A study designed to characterize ChatGPT alone could go deeper on prompt taxonomy and conversation context. This study trades depth for cross-engine comparability.
Finding 1: Citation density varies more by engine than by vertical
The single most-replicated number in the dataset is the per-engine citation density. Across every vertical, every intent type, and every run, Perplexity cites more unique domains per answer than the other three engines combined, often by a factor of 2-3x.
Median unique domains cited per answer, by engine, blended across verticals:
Engine
Median
25th percentile
75th percentile
Mean
Perplexity
6.4
4.8
8.2
6.7
Claude
3.6
2.4
5.1
3.9
ChatGPT Search
3.1
2.0
4.3
3.3
Gemini
2.4
1.6
3.4
2.6
All engines (blended)
3.4
2.0
5.8
4.1
The Perplexity-to-Gemini ratio is 2.7x — meaning a buyer-intent query routed to Perplexity exposes the user (and any cited brand) to roughly 2.7 times the citation real estate that the same query on Gemini would. That ratio is roughly stable across all 12 verticals, which is the strongest evidence that engine architecture is the dominant driver of citation density, not vertical or prompt subject matter.
The variance within each engine is also worth a look. The interquartile range on Perplexity (4.8 to 8.2) is wider in absolute terms than the IQR on Gemini (1.6 to 3.4), but it is narrower in relative terms (Perplexity 75th percentile is 1.7x the 25th, Gemini 75th percentile is 2.1x the 25th). The relative variance widens as citation density falls, which is what you would expect from a smaller-N base.
Once you have the engine baselines, the next cut is per-vertical. Here is the full 12-vertical × 4-engine matrix, with the cross-engine blended median in the rightmost column. Every cell is the median across the 100 prompts in that vertical for that engine, across all three runs (so the underlying sample is 300 prompt-runs per cell).
Vertical
Perplexity
Claude
ChatGPT
Gemini
Blended median
SaaS
7.4
4.6
4.1
3.2
5.1
Legal
6.8
4.3
3.7
3.0
4.7
Fintech
6.7
4.2
3.6
2.9
4.6
Consumer electronics
6.5
4.1
3.5
2.8
4.5
B2B services
6.4
3.9
3.3
2.6
4.3
Insurance
6.2
3.7
3.1
2.4
4.0
Education
6.1
3.6
2.9
2.3
3.9
Real estate
5.8
3.4
2.8
2.2
3.6
Travel
5.6
3.3
2.7
2.1
3.5
DTC apparel
5.4
3.1
2.6
2.0
3.3
Food and beverage
5.1
2.9
2.4
1.8
3.1
Healthcare
4.6
2.5
2.0
1.5
2.4
The healthcare result is the most striking outlier in the dataset. Healthcare prompts produce roughly half the citation density of SaaS prompts across every engine. The mechanism is not a mystery: AI engines have well-documented health-content guardrails that concentrate citations on a small pool of high-trust sources — NIH, Mayo Clinic, CDC, Cleveland Clinic, Healthline, WebMD — rather than spreading them across the long tail of independent health blogs the way they do for, say, consumer electronics. Anthropic's usage policy and Google's AI principles both explicitly call out medical content as a high-trust domain. The result is a citation oligopoly: in healthcare, six domains capture roughly 71% of all citation slots, versus 28% for the top six in SaaS.
The SaaS result tracks the opposite dynamic — a highly fragmented review-and-comparison ecosystem (G2, Capterra, TrustRadius, GetApp, plus dozens of niche review sites and SaaS-focused newsletters) gives engines plenty of editorial signal to spread citations widely. The engines do not appear to be choosing between a tight pool and a wide pool; they appear to be reflecting how editorial coverage is structured in each vertical.
The cross-engine consistency of the ranking is what makes me confident the pattern is real. SaaS leads on every engine. Healthcare trails on every engine. The middle of the ranking shuffles slightly per engine, but the top three (SaaS, legal, fintech) and bottom three (healthcare, food and beverage, DTC apparel) are stable across all four. That cross-engine stability is the strongest single evidence point we can offer that the per-vertical numbers are not engine artifacts.
Finding 3: Editorial reviews capture nearly a quarter of all citation slots
Aggregating across all 51,723 citation events into source-type buckets produces the cleanest cross-vertical story in the study: a small number of source types capture an outsized share of citation real estate, and the share is consistent across engines.
The 22.3% editorial-review figure is the single most surprising number in the study to me personally. I went into the data expecting Reddit and Wikipedia to dominate, because those are the sources every AI search optimization article holds up as examples. Instead, the dominant category is the editorial-review property: G2 alone took 3.1% of every citation slot in the entire corpus, Wirecutter took 1.7%, NerdWallet took 1.6%, Investopedia 2.3%, Healthline 2.1%, Forbes 2.6%, Capterra 1.4%, TrustRadius 0.9%. The top ten editorial-review domains together captured 16.4% of all citation slots — more than Reddit and Wikipedia combined. For any brand whose vertical has a dominant review property (G2 for SaaS, Wirecutter for consumer electronics, NerdWallet for fintech, Healthline for healthcare), getting your product listed and well-rated on that single property is the highest-leverage citation lever in the dataset.
The source-type mix per vertical varies a lot more than the cross-engine engine mix did:
Vertical
Editorial
Vendor
Reddit
Wikipedia
News
Forum
Academic
SaaS
24.1%
21.3%
14.2%
4.1%
6.8%
11.7%
1.4%
DTC apparel
28.4%
13.7%
12.4%
5.2%
8.1%
3.6%
0.8%
Fintech
31.7%
16.4%
9.8%
7.1%
11.3%
4.2%
5.9%
Insurance
24.6%
12.1%
7.4%
6.8%
13.1%
3.9%
8.7%
Legal
18.9%
11.7%
1.4%
9.3%
7.6%
2.1%
18.4%
Healthcare
22.7%
5.2%
8.3%
13.6%
6.4%
2.4%
24.1%
Education
21.4%
11.3%
9.7%
11.2%
7.9%
4.3%
14.8%
B2B services
23.8%
17.6%
10.1%
5.7%
9.4%
5.8%
1.7%
Real estate
19.6%
14.2%
8.7%
7.4%
12.3%
4.1%
3.6%
Travel
17.4%
13.9%
14.6%
9.2%
8.7%
3.7%
1.9%
Food and beverage
16.2%
11.4%
16.8%
6.3%
7.4%
4.6%
1.2%
Consumer electronics
18.7%
14.6%
23.1%
5.4%
6.9%
9.8%
1.1%
Two patterns jump out. First, Reddit's share swings wildly by vertical: 23.1% on consumer electronics, 16.8% on food and beverage, 14.6% on travel — but only 1.4% on legal and 7.4% on insurance. The engines treat Reddit as a high-authority opinion source for consumer-product categories and as essentially noise for regulated-industry questions. That is a strategically meaningful split: if you sell consumer electronics, your Reddit presence is roughly as important as your editorial-review presence. If you sell legal services, Reddit is a rounding error.
Second, academic/government share concentrates in three verticals: healthcare (24.1%), legal (18.4%), and education (14.8%). In every other vertical it sits below 9%. The engines route these verticals to authoritative primary sources by design, which is what compresses citation density for healthcare specifically (the high-trust pool is small) and gives a slight density bump to legal (the authoritative pool is also small but more fragmented).
For a deeper read on which source types convert best once they have driven a click, see our companion piece on share of voice in AI search and the AI visibility score breakdown — both surface revenue-weighted source-type performance for paying Attrifast customers.
Finding 4: Per-vertical top cited domains
For each vertical we publish the top five most-cited unique domains across all four engines and all 100 prompts (× 3 runs each). The numbers in the "Share" column are the percentage of citation slots in that vertical's 1,200-engine-run sample (100 prompts × 4 engines × 3 runs).
SaaS — top 5 cited domains
Rank
Domain
Source type
Share of SaaS citation slots
1
g2.com
Editorial review
9.4%
2
reddit.com
Reddit
6.8%
3
capterra.com
Editorial review
5.1%
4
hubspot.com
Vendor / blog
3.9%
5
trustradius.com
Editorial review
3.4%
DTC apparel — top 5 cited domains
Rank
Domain
Source type
Share of apparel citation slots
1
reddit.com
Reddit
7.9%
2
nytimes.com (Wirecutter style)
Editorial review
5.6%
3
gq.com
Editorial review
4.7%
4
youtube.com
Long-tail
4.3%
5
wikipedia.org
Wikipedia
3.8%
Fintech — top 5 cited domains
Rank
Domain
Source type
Share of fintech citation slots
1
nerdwallet.com
Editorial review
11.2%
2
investopedia.com
Editorial review
9.7%
3
bankrate.com
Editorial review
6.4%
4
reddit.com (r/personalfinance)
Reddit
5.8%
5
wikipedia.org
Wikipedia
4.1%
Insurance — top 5 cited domains
Rank
Domain
Source type
Share of insurance citation slots
1
nerdwallet.com
Editorial review
8.3%
2
policygenius.com
Editorial review
6.7%
3
naic.org
Academic / government
5.4%
4
thezebra.com
Editorial review
4.1%
5
reddit.com
Reddit
3.6%
Legal — top 5 cited domains
Rank
Domain
Source type
Share of legal citation slots
1
law.cornell.edu
Academic / government
7.4%
2
nolo.com
Editorial review
6.3%
3
findlaw.com
Editorial review
5.8%
4
sec.gov
Academic / government
4.9%
5
wikipedia.org
Wikipedia
4.6%
Healthcare — top 5 cited domains
Rank
Domain
Source type
Share of healthcare citation slots
1
nih.gov
Academic / government
14.7%
2
mayoclinic.org
Editorial review (trusted)
12.3%
3
cdc.gov
Academic / government
9.8%
4
webmd.com
Editorial review
7.1%
5
healthline.com
Editorial review
6.9%
Education — top 5 cited domains
Rank
Domain
Source type
Share of education citation slots
1
usnews.com
Editorial review
7.6%
2
wikipedia.org
Wikipedia
5.4%
3
nces.ed.gov
Academic / government
5.1%
4
reddit.com
Reddit
4.7%
5
niche.com
Editorial review
4.2%
B2B services — top 5 cited domains
Rank
Domain
Source type
Share of B2B services citation slots
1
clutch.co
Editorial review
8.9%
2
hubspot.com
Vendor / blog
4.7%
3
g2.com
Editorial review
4.3%
4
reddit.com
Reddit
4.1%
5
linkedin.com
Long-tail
3.7%
Real estate — top 5 cited domains
Rank
Domain
Source type
Share of real estate citation slots
1
zillow.com
Vendor / editorial
7.8%
2
nar.realtor
Academic / government
5.4%
3
realtor.com
Vendor / editorial
5.1%
4
reddit.com
Reddit
4.3%
5
redfin.com
Vendor / editorial
3.7%
Travel — top 5 cited domains
Rank
Domain
Source type
Share of travel citation slots
1
reddit.com (r/travel, r/solotravel)
Reddit
9.6%
2
tripadvisor.com
Editorial review
6.4%
3
nomadlist.com
Editorial review
4.7%
4
youtube.com
Long-tail
4.1%
5
wikipedia.org
Wikipedia
3.8%
Food and beverage — top 5 cited domains
Rank
Domain
Source type
Share of food and beverage citation slots
1
reddit.com (r/MealPrepSunday, r/cooking)
Reddit
11.2%
2
nytimes.com (NYT Cooking)
Editorial review
5.7%
3
seriouseats.com
Editorial review
4.6%
4
youtube.com
Long-tail
4.3%
5
bonappetit.com
Editorial review
3.4%
Consumer electronics — top 5 cited domains
Rank
Domain
Source type
Share of consumer electronics citation slots
1
reddit.com (r/buildapc, r/headphones)
Reddit
14.1%
2
rtings.com
Editorial review
7.8%
3
nytimes.com (Wirecutter)
Editorial review
6.9%
4
youtube.com
Long-tail
5.4%
5
tomshardware.com
Editorial review
3.6%
The patterns repeat enough to summarize. In regulated verticals (healthcare, legal, insurance), academic and government domains plus one or two dominant trusted-editorial properties capture the top 5. In consumer verticals (electronics, apparel, food, travel), Reddit and a small number of category-defining editorial reviews dominate. In B2B verticals (SaaS, B2B services), category-specific software directories (G2, Capterra, Clutch) lead, often by a wide margin. The implication is that there is no universal "get cited by AI" playbook — the right next move depends entirely on which dominant property your vertical concentrates on.
Finding 5: Citation count distribution is right-skewed
A vertical median tells you the center of a distribution. The shape of the distribution tells you whether the median is a useful representation. Across all 14,400 prompt-runs we logged the count of unique cited domains per answer and bucketed them:
Citations per answer
Count of prompt-runs
Share of corpus
0 (no citations)
287
2.0%
1
1,143
7.9%
2
2,167
15.0%
3
2,612
18.1%
4
2,498
17.3%
5
1,929
13.4%
6
1,357
9.4%
7
906
6.3%
8
587
4.1%
9
385
2.7%
10
232
1.6%
11-15
247
1.7%
16+
50
0.4%
Three reads from this distribution. First, the modal answer cites 3 unique domains — that is the single most common outcome across the entire corpus. Second, the distribution is clearly right-skewed: a long tail of answers cites 8, 9, 10, or more domains, almost always Perplexity answers in SaaS or fintech. Third, 2.0% of all answers cite zero domains — these are the answers where the engine declined to cite anything, either because the topic was sensitive (healthcare prompts produced most of the zero-citation responses) or because the answer was definitional and the engine answered from training-only knowledge.
The 0-citation share is interesting from a measurement perspective. If you are running citation monitoring and a prompt returns no citations, that is not a measurement failure — it is a signal that the engine treats that query as one where citation is unnecessary or unsafe. We saw the highest zero-citation rates on healthcare (3.7%), legal (3.1%), and definitional queries across all verticals (4.2%). The lowest zero-citation rates were on SaaS comparisons (0.4%) and fintech "best X for Y" prompts (0.6%).
Finding 6: Year-over-year citation behavior
We ran a smaller pilot study in May 2025 — 300 prompts across the same 12 verticals, on the same four engines, with a similar prompt construction methodology but a coarser source-type taxonomy. Treating the 2025 pilot as a reference point lets us estimate year-over-year change with appropriate caveats: the 2025 sample is 4x smaller and the methodology was less mature, so YoY deltas are directional rather than precision-grade.
Median citations per answer, May 2025 vs May 2026:
Engine
May 2025 (n=300 prompts)
May 2026 (n=1,200 prompts)
YoY change
Perplexity
4.9
6.4
+30.6%
Claude
2.4
3.6
+50.0%
ChatGPT Search
2.7
3.1
+14.8%
Gemini
1.8
2.4
+33.3%
Cross-engine median
2.7
3.4
+25.9%
Claude grew citation density the fastest at +50%, which tracks Anthropic's product roadmap — Claude added web search as a first-class feature mid-2025 and the citation surface matured through the second half of 2025. Perplexity grew citation density by +31% on an already-high base, suggesting their architecture is still actively widening the citation pool per answer. ChatGPT was the slowest grower (+15%) because it started with a lower density and OpenAI has prioritized synthesized-answer UX over citation-density UX.
Source-type share, May 2025 vs May 2026:
Source type
May 2025 share
May 2026 share
YoY delta
Editorial reviews
24.7%
22.3%
-2.4 pp
Long-tail blogs
22.1%
19.3%
-2.8 pp
Vendor sites
13.4%
14.7%
+1.3 pp
Reddit
7.8%
11.4%
+3.6 pp
News / press
8.9%
9.1%
+0.2 pp
Wikipedia
9.1%
8.9%
-0.2 pp
Forum / Q&A
7.1%
7.8%
+0.7 pp
Academic / government
6.9%
6.5%
-0.4 pp
The biggest YoY mover is Reddit, which grew its share of citation slots by +3.6 percentage points (from 7.8% to 11.4%). The likely cause: OpenAI's Reddit data licensing deal hit steady-state through 2025, Google's parallel Reddit deal continued to mature, and the engines collectively rebalanced toward Reddit content for opinion-driven and product-comparison queries. The smallest mover is Wikipedia (-0.2 pp), which is essentially flat — Wikipedia's role as the canonical entity-disambiguation source does not appear to be eroding even as Reddit's share grows.
The +1.3 pp gain for vendor sites is also worth flagging. Engines appear to be slightly more willing to cite a brand's own pages in 2026 than they were in 2025, which is consistent with what we have seen anecdotally in our AI citation tracking feature: brands with well-structured FAQ pages, clear pricing pages, and entity-clean metadata are earning vendor citations at a higher rate than the same brands earned a year ago.
Finding 7: Per-engine variance and the reproducibility problem
Every prompt in this study was run three times per engine to characterize variance. The result is unambiguous: AI citation behavior is genuinely non-deterministic, and single-shot citation checks should be treated as samples, not measurements.
Average citation overlap between the three runs of the same prompt, by engine:
Engine
Mean Jaccard overlap (3 runs)
Stable citations (in all 3 runs)
Volatile citations (in 1 of 3 runs only)
Perplexity
0.67
51% of all citations
24%
ChatGPT Search
0.58
41%
31%
Gemini
0.54
37%
33%
Claude
0.49
33%
36%
Perplexity is the most reproducible engine in the study — two-thirds of citations recur across all three runs, which is consistent with Perplexity's retrieval-first architecture (the retrieval layer is presumably more deterministic than the generation layer). Claude is the least reproducible at 0.49 mean Jaccard, which means roughly half the citations on the average Claude answer change between runs. ChatGPT and Gemini sit in the middle.
The practical implication is that any citation-monitoring program that relies on a single weekly snapshot is reading noise. To reliably detect a citation gain or loss for a specific brand on a specific prompt, you need either (a) at least 3-5 runs per snapshot, (b) a multi-week rolling window, or (c) both. Tools like Profound, Otterly, and Peec automate this by running prompts continuously on a schedule; if you are tracking citations yourself in a spreadsheet, the single biggest methodology upgrade is to run each prompt multiple times.
The variance also matters for benchmarking against this study. If your SaaS site appears in 4 of 10 Perplexity SaaS prompts in a single-shot check, that does not necessarily mean you are below the cohort baseline of 7.4 citation slots per answer — it might mean you are in the 33% of citations that don't recur across runs. Three-run validation before drawing a conclusion is the minimum standard.
Finding 8: Branded vs non-branded query citation patterns
Of the 1,200 prompts, 312 named a specific brand in the query (e.g., "Stripe vs Paddle for European SaaS") and 888 were non-branded category queries (e.g., "best payment processor for European SaaS"). The citation behavior between these two groups is substantially different.
Metric
Branded queries (n=312)
Non-branded queries (n=888)
Median citations per answer (cross-engine)
4.6
3.1
Vendor (brand's own domain) share
27.4%
11.7%
Editorial review share
19.8%
23.2%
Reddit share
14.1%
10.5%
Wikipedia share
6.4%
9.6%
Top citation position is vendor domain
41% of answers
8% of answers
Two patterns. First, branded queries cite the named brand's own domain in 41% of answers as the top citation — that is the single highest-controllable citation event in the dataset. If a user explicitly names your brand, AI engines reliably surface your own documentation, pricing page, or marketing pages as the first source. This is the "branded AI traffic" mechanism that drives the 6.4% conversion rate on branded AI queries in our revenue benchmark.
Second, non-branded queries are where Wikipedia and editorial reviews compete for the explanatory slot. When a user asks "best CRM for small SaaS teams" rather than naming a specific brand, the engines spend more citation real estate on Wikipedia for entity/category definition (9.6% vs 6.4% on branded) and on editorial reviews for vendor comparison (23.2% vs 19.8%). For brands trying to win the non-branded query, the path is editorial review presence + Wikipedia entity disambiguation, not vendor-site optimization.
Cross-vertical analysis: where citation share concentrates
If you flip the data and ask "what share of citation slots in each vertical is captured by the top 10 cited domains," you get a concentration index that varies widely:
Vertical
Top-10 domain concentration
Long-tail share (rank 11+)
Healthcare
71.2%
28.8%
Legal
54.7%
45.3%
Insurance
47.9%
52.1%
Fintech
45.3%
54.7%
Real estate
38.6%
61.4%
Education
36.4%
63.6%
Travel
34.1%
65.9%
Food and beverage
32.7%
67.3%
B2B services
31.8%
68.2%
Consumer electronics
31.4%
68.6%
DTC apparel
30.9%
69.1%
SaaS
28.4%
71.6%
This is the cleanest single-axis read of the dataset. Healthcare is a citation oligopoly (71.2% of citation slots concentrated on 10 domains); SaaS is a citation long tail (only 28.4% concentrated on the top 10). Regulated verticals concentrate; consumer and B2B-software verticals fragment. The strategic implication is obvious in both directions: if you operate in healthcare or legal, your citation strategy is "get on the top-10 list or get nothing"; if you operate in SaaS or consumer electronics, your citation strategy is "earn slots across the long tail, no single property will save or sink you."
Per-engine deep dives
Perplexity: the citation-dense engine
Perplexity is the dominant citation surface in the dataset — highest density, widest source diversity, most reproducible run-to-run. Its top source-type mix tilts heavily toward editorial reviews (24.6% of Perplexity citations) and Reddit (13.2% of Perplexity citations), with Wikipedia at a relatively low 6.8% (because Perplexity prefers fresher sources over encyclopedic ones).
Perplexity-specific finding
Value
Median citations per answer
6.4
Share of all citation slots captured by top 100 domains
38.4%
Reddit share of Perplexity citations
13.2%
Wikipedia share of Perplexity citations
6.8%
Vendor share of Perplexity citations
16.4%
Average answer length (characters, prose only)
~2,140
Perplexity's citation behavior is the closest of the four engines to "treat every prompt as a research query." It is the right engine to optimize for if your goal is breadth of presence across the long tail.
ChatGPT Search: the synthesized-answer engine
ChatGPT cites less densely than Perplexity but the citations it does include carry more visual weight inside the answer (footnote-style superscripts that read like editorial citations rather than a sidebar of links).
ChatGPT-specific finding
Value
Median citations per answer
3.1
Share of all citation slots captured by top 100 domains
47.3%
Reddit share of ChatGPT citations
14.7%
Wikipedia share of ChatGPT citations
11.3%
Vendor share of ChatGPT citations
13.9%
Average answer length (characters, prose only)
~1,580
ChatGPT cites Wikipedia and Reddit more heavily than Perplexity does (combined 26% vs Perplexity's 20%), reflecting OpenAI's heavier weight on Reddit (via the OpenAI-Reddit licensing deal) and on Wikipedia as an entity backbone. For most brands, ChatGPT is the engine where Reddit presence pays the most dividend per dollar of effort.
Claude: the high-trust, low-density engine
Claude has the lowest citation density of the four engines (median 3.6 — closer to ChatGPT than to Perplexity) and the lowest run-to-run reproducibility (0.49 Jaccard). Its source mix skews toward authoritative editorial properties and primary sources.
Claude-specific finding
Value
Median citations per answer
3.6
Share of all citation slots captured by top 100 domains
51.4%
Reddit share of Claude citations
7.9%
Wikipedia share of Claude citations
9.1%
Vendor share of Claude citations
11.2%
Academic / government share of Claude citations
11.8%
Claude under-indexes on Reddit (7.9% vs the cross-engine 11.4%) and over-indexes on academic and government sources (11.8% vs cross-engine 6.5%). This makes Claude the easiest engine to win citations on for regulated-industry brands with authoritative content (and the hardest for consumer-product brands that rely on community-driven recommendation).
Gemini: the trusted-pool engine
Gemini cites the fewest unique domains per answer (median 2.4) and concentrates those citations on a small pool dominated by Google's own properties and a short list of trusted editorial domains.
Gemini-specific finding
Value
Median citations per answer
2.4
Share of all citation slots captured by top 100 domains
58.7%
Reddit share of Gemini citations
8.4%
Wikipedia share of Gemini citations
12.1%
Vendor share of Gemini citations
14.1%
YouTube share of Gemini citations
9.7%
The 9.7% YouTube share on Gemini is the single most engine-specific quirk in the dataset — Gemini cites YouTube content roughly 2.5x more often than the other three engines do, reflecting Google's deep integration of YouTube as a knowledge source. For brands in any vertical where YouTube content is part of the conversation (consumer electronics, education, food, travel), Gemini citation strategy looks meaningfully different from ChatGPT or Perplexity strategy.
Implications for marketers
The numbers above produce eight strategic implications that I think generalize across most brand contexts. I am stating them as opinions because the data only supports the patterns; the actions are interpretation.
1. Per-engine strategy is not optional. Citation density varies 2.7x between Perplexity and Gemini, source-type mix varies 2-3x by engine, and run-to-run reproducibility varies from 0.49 to 0.67. A single "AI optimization" strategy that treats the four engines as one bucket will underperform a per-engine strategy on at least one engine. The minimum useful split is "Perplexity (breadth play)" + "ChatGPT (Reddit + Wikipedia play)" + "Claude (authoritative editorial play)" + "Gemini (Google-properties play)."
2. Editorial review presence is the highest single lever in most verticals. Across the corpus, editorial reviews captured 22.3% of citation slots. In SaaS, getting on G2's "best of" lists is roughly as valuable as ranking on page one of Google. In fintech, NerdWallet placement is the equivalent. In consumer electronics, Wirecutter and Rtings. The single highest-leverage non-paid GEO investment most brands can make is concentrated lobbying of the 3-5 editorial properties that dominate their vertical.
3. Reddit strategy is vertical-specific. Reddit captured 23.1% of consumer electronics citations and 1.4% of legal citations. For consumer-facing brands, an honest, sustained Reddit presence (real product accounts, real engagement, no astroturfing) is one of the few low-cost citation levers left. For B2B regulated-industry brands, Reddit is a rounding error.
4. Vendor-site optimization pays in branded queries, not non-branded. Vendor citations dominate 41% of branded-query top slots but only 8% of non-branded top slots. If your AI strategy is "get cited on my own domain," it works for branded queries and fails for non-branded discovery. The two need separate playbooks.
5. Healthcare and regulated industries need a top-10 strategy, not a long-tail one. Healthcare's top 10 cited domains capture 71.2% of all citation slots. If you sell into healthcare, your only realistic path to AI visibility is becoming one of those 10 — or earning citations on the small set of trusted editorial sources (Healthline, WebMD) that the engines accept as proxies for the authoritative pool.
6. Run citation checks at least three times. Run-to-run reproducibility is 0.49-0.67 depending on engine. A single-shot citation snapshot has roughly a one-in-three chance of either over- or under-counting your presence on any given prompt. Multi-run sampling is non-negotiable for serious citation monitoring.
7. Wikipedia entity disambiguation is plumbing, not strategy. Wikipedia holds steady at 8.9% of citation slots across engines and years, but its role is as an entity backbone, not as a discovery surface. If your brand has a Wikipedia page (where appropriate) with clean entity data, citations on Wikipedia-adjacent queries are essentially free; if you don't, you are leaving a small but predictable share of citation real estate on the table.
8. Track citation share by vertical, not by engine alone. The cross-vertical concentration index (28.4% in SaaS, 71.2% in healthcare) means the "good" benchmark depends entirely on your vertical. A 10% share of voice in healthcare is enormous; a 10% share of voice in SaaS is one large editorial property. Benchmarking against this study's per-vertical baseline is more useful than benchmarking against a cross-vertical average.
For the product side, the AI visibility score feature operationalizes the per-engine, per-vertical share-of-voice numbers in this study at the per-customer level.
FAQ
What is the median number of citations per answer across AI engines in 2026?
Across the 1,200-prompt corpus, Perplexity cited a median of 6.4 unique domains per answer, Claude 3.6, ChatGPT 3.1, and Gemini 2.4. The blended cross-engine median sits at 3.4 citations per answer, but reporting the blend hides the most important fact in the dataset: Perplexity cites roughly 2.7x more URLs per answer than Gemini and 2.1x more than ChatGPT. For benchmarking purposes the per-engine median is the only honest number — a blended figure averages four engines with structurally different citation behaviors and will mislead any single-engine optimization decision.
Which industry vertical gets the most AI citations per answer in 2026?
SaaS leads at a blended median of 5.1 unique domains cited per answer, with Perplexity citing 7.4 domains per SaaS prompt on average. Legal (4.7 blended), fintech (4.6), and consumer electronics (4.5) round out the top four. Healthcare sits at the bottom at a blended 2.4 — and only 2.1 on Perplexity — because the engines aggressively concentrate health answers on a small set of high-trust sources (NIH, Mayo Clinic, CDC, WebMD, Cleveland Clinic) rather than spread citations across a long tail. The ranking is consistent across every engine we tested.
How much of AI citation share goes to Reddit, Wikipedia, and forums vs vendor sites?
Reddit captured 11.4% of all citation slots, Wikipedia 8.9%, vendor sites 14.7%, editorial reviews (Wirecutter, NerdWallet, G2, Healthline, Investopedia) 22.3%, forums and Q&A 7.8%, news and press 9.1%, and academic or government sources 6.5%. The remaining 19.3% is a long tail of blog posts, documentation pages, podcasts, and YouTube. The share varies dramatically by vertical: Reddit takes 23.1% of citations on consumer electronics prompts but only 1.4% on legal prompts.
Why does Perplexity cite more sources per answer than ChatGPT?
Architecturally, Perplexity is built as a retrieval-first answer engine that surfaces inline citations as a primary UX element, so the product incentive is to display more sources visibly. ChatGPT Search uses retrieval as a supporting layer behind a more synthesized answer; citations appear as footnote-style references with fewer slots. The result, measured across our 1,200-prompt corpus, is that Perplexity exposes a median 6.4 unique domains per answer to ChatGPT's 3.1, a 2.1x ratio.
How did we run the AI citation benchmark study?
We constructed 1,200 buyer-intent prompts — 100 per vertical across 12 verticals — drawn from Google Search Console queries on attrifast.com and client sites, public Reddit threads, and AnswerThePublic exports filtered to commercial-intent patterns. Each prompt was executed three times per engine on ChatGPT Search, Claude with web search, Gemini, and Perplexity, between April 12 and May 14, 2026. Total observation count: 14,400 distinct prompt-engine-run triples producing roughly 51,723 citation events on roughly 8,917 unique domains.
What are the top cited domains in AI answers across all verticals?
The most-cited domain was Wikipedia (8.9% of all citation slots), followed by Reddit (11.4% spread across many subreddits), G2 (3.1%), Forbes (2.6%), Investopedia (2.3%, concentrated in fintech), Healthline (2.1%, concentrated in healthcare), and Wirecutter (1.7%, concentrated in consumer electronics). After the top 20, the long tail is very long: roughly 4,200 distinct domains appeared as citation events at least once, and the bottom 50% of cited domains were cited only once across all 14,400 prompt-runs.
How much has AI citation behavior changed vs 2025?
Citation density rose roughly 28% year-over-year between May 2025 and May 2026, with most growth on Perplexity (+31%) and Claude (+50%). ChatGPT grew the slowest at +15%; Gemini grew +33%. The vertical mix also shifted: Reddit's share of citations grew from 7.8% in our 2025 reference sample to 11.4% in 2026, reflecting Reddit's licensing deals entering steady-state, while Wikipedia's share stayed flat at 8.9%. The 2025 reference numbers come from a smaller 300-prompt pilot, so treat the YoY deltas as directional.
How variable are AI citations between runs of the same prompt?
More variable than most operators assume. Average overlap between three runs was 67% on Perplexity, 58% on ChatGPT, 54% on Gemini, and 49% on Claude. That means one in three to one in two citations changes between runs. Variance is highest on broad category queries and lowest on specific entity queries. The practical implication: a single-shot citation check is worth about as much as a single-day rank check — directional, not definitive.
What share of AI citations go to a brand's own domain?
Vendor citations averaged 14.7% of all citation slots across the corpus. The share is highest in SaaS at 21.3% and lowest in healthcare at 5.2%. For most brands, the vendor share is the single most controllable lever: a well-structured documentation site with answer-shaped FAQ pages and entity-clean metadata can earn 2-3 vendor citations per branded query on Perplexity and ChatGPT, even without external backlinks.
Which engine is hardest to earn citations on?
Gemini, by a wide margin. Gemini cites the fewest unique domains per answer (median 2.4) and concentrates citations on a smaller, more "trusted" pool — Google's own properties, Wikipedia, government domains, and a short list of editorial properties. Claude is second-hardest: low density (3.6) plus a preference for primary sources and authoritative editorial properties. Perplexity is the easiest engine to earn a first citation on, because its citation density and source-diversity preferences create more open slots.
Does this study measure citation-driven traffic or just citation presence?
Just citation presence. This study measures whether and how often a domain appears as a citation in AI engine answers — it does not measure whether the citation produced a click, a session, or a paying customer. For the traffic and revenue side, see the 2026 AI Traffic Revenue Benchmark across 200 Stripe-connected sites. The two studies are complementary.
What are the methodology limitations of the citation benchmark?
Five worth flagging. (1) Prompt selection bias — our 1,200 prompts skew toward SMB phrasing, not Fortune 500 procurement. (2) Engine harness drift — engines update silently and the snapshot is a one-month window. (3) Three-run sampling with 33-51% inter-run variance is directional, not precision-grade. (4) The 12-vertical taxonomy lumps some categories together. (5) No revenue join — this study measures presence only.
Should I treat the per-vertical numbers as targets or directional?
Directional, with caveats. The cohort medians and per-engine ratios are stable enough that we publish them as benchmarks — if your SaaS site is appearing in 1 of 10 Perplexity SaaS prompts, that is meaningfully below the cohort baseline of roughly 7.4 distinct citation slots per answer. But absolute citation counts depend on the prompt mix, the snapshot window, and the dedup rules. Another team running a similar study on a different prompt set would get numbers within roughly ±15% of ours.
How does this study compare to Profound, SEOcrawl, or Princeton GEO research?
Different layers of the same problem. Profound, Otterly, and Peec AI run continuous citation monitoring on customer-specified prompts, so their numbers are per-customer rather than a cross-vertical benchmark. SEOcrawl publishes aggregated prompt-tracking data but at smaller scale and with less methodology transparency. The Princeton GEO research tested which on-page changes lift visibility but did not publish per-vertical citation distributions. This study sits in the gap: a published, methodology-transparent cross-vertical benchmark that any team can replicate.
See your citation share across ChatGPT, Claude, Gemini, and Perplexity
This study measured presence at the industry level. Attrifast measures presence and revenue at your level — joined to Stripe payments so you know which AI citations actually pay.