The honest formula for share of voice in AI search, with a worked example across 3 competitor brands and 30 prompts. Why the classic impression-share definition breaks and what replaces it.
I get asked the same question by founders about once a week now: how do I compute share of voice for ChatGPT and Perplexity the way my paid-search consultant computes impression share for Google Ads. The honest answer is that the analogue exists, the formula is one line, and the discipline around the inputs takes more pages to explain than the formula itself. This article is the long version of that conversation.
The reason it deserves more than one page is that the classic share of voice math breaks for AI in three structural ways, and a lot of vendors are reporting a clean-looking number that papers over the breaks. I will show what the classic definition was, why it breaks, what replaces it, and then walk a worked example with three competitor brands and 30 prompts that you can replicate in a spreadsheet. The example is the part most articles in this category skip, which is exactly why so many teams end up with a share of voice number they cannot defend in an executive review.
The frame I keep coming back to: share of voice in AI search is the cleanest single number for competitive position, but only when the inputs are honest. Honest inputs are a stable prompt set, an explicit competitor set, a defined window, a deterministic parser, and a re-baseline discipline tied to model updates. Skip any of those and you are reporting a vibe, not a metric.
Quick facts
Spec
Value
Source
Year classic share-of-voice formula stabilized
1960s-70s
Marketing literature, Ehrenberg-Bass [1]
Year AI share-of-voice formula started forming
2024
Profound, Otterly, Peec product convergence
Minimum prompt set for reliable SOV
~30 prompts
Sampling math + operator practice
Minimum runs per prompt for reliable SOV
5-10
Engine non-determinism dampening
Recommended weekly cadence
Weekly
Engines update on weeks, not hours
Princeton GEO lift from citations/stats
Up to 40%
Aggarwal et al. [2]
US English AIO appearance rate (Q1 2026)
13-15%
Search Engine Land [3]
Median GA4 Direct that is actually AI-referred
34%
Attrifast cohort [4]
Per-engine RPV cohort ranking
Perplexity > Claude > ChatGPT > Gemini > AIO
Attrifast benchmark [4]
Loamly audited brands scoring under 20 visibility
85.7%
Loamly research [5]
Recommended SOV reporting view
Per-engine primary + revenue-weighted blend
Attrifast practice
Re-baseline trigger
Model launches + prompt-set changes >10% + quarterly
Attrifast practice
What the classic share of voice meant, and why it breaks
Share of voice originated in advertising literature as the share of total category advertising spend or impressions a brand owned in a window. The Ehrenberg-Bass tradition extended it into a predictor of market share growth (excess share of voice over market share predicts future market-share gain). In paid search, Google operationalized it as impression share: your eligible auctions where the ad appeared / total eligible auctions, computed by Google's auction log.
Channel
Numerator
Denominator
Data source
Classic advertising
Your spend or GRPs
Total category spend or GRPs
Industry trackers (Nielsen, Kantar)
Paid search
Your impressions
Total eligible auctions
Google Ads platform
SEO (informal)
Your top-10 keywords
Total category keywords in top-10
Ahrefs, Semrush share-of-search
Social listening
Your brand mentions
Total brand mentions in category
Brandwatch, Sprout Social
AI search
Your citations or mentions
Total citations or mentions in category
Prompt-tracker sampling
The first four rows share a common property: there is a well-defined platform-side log of the events being counted, or at least a reasonably stable corpus you can crawl. The fifth row does not. AI search has no auction log, no impression event, no fixed inventory of "queries the engines could have shown you on." Every answer is generated per-query and the citation list is part of the generation, not a retrieval against a stable index of slots.
That structural difference is why classic impression-share math breaks. There is no stable total to put in the denominator. The reconstruction is sampling-based: define a prompt set (your treated-as-known queries), define a competitor set (your treated-as-relevant sources), and compute citation share within those bounds. The result is real and useful, but it is a sampled estimate, not a platform log.
The cleaner way to see the break is to enumerate the assumptions that hold for paid search and fail for AI:
Assumption
Paid search impression share
AI share of voice
Stable inventory of events
Yes (eligible auctions)
No (per-query generation)
Platform-side event log
Yes (Google Ads)
No (prompt tracker reconstructs)
Deterministic outcomes
Mostly yes given bid
No (model sampling adds variance)
Unambiguous competitor set
Yes (advertisers in auction)
No (you choose)
Discrete event units
Yes (impression)
Mixed (citation, mention, both)
Constant denominator
Yes (within budget)
No (changes with prompt set)
Five of six assumptions fail. The metric still translates conceptually, but the math has to be redesigned around sampling. That redesign is the rest of the article.
The 2026 formula for AI share of voice
The cleanest one-line formula I use is:
Share of Voice (you) = sum of your citations across the prompt set in the window / sum of all citations across the same prompt set in the window, where the competitor set is a named list of brand domains.
That formula has six implicit choices that have to be made explicit before the number means anything. The choices are: prompt set, window, engine set, competitor set, citation-vs-mention rule, and weighting scheme.
Choice
Standard option
Notes
Prompt set
30-200 buyer-intent prompts
Must be stable across the window
Window
4-12 weeks
Long enough to denoise, short enough to react
Engine set
ChatGPT + Perplexity (minimum)
Add Claude, Gemini, AIO based on audience
Competitor set
3-7 named brand domains
Wikipedia, Reddit usually excluded
Citation vs mention
Report both separately
Combining produces misleading clean number
Weighting (blended)
Revenue-weighted by engine
Equal-weighted over-rewards low-pay engines
Make those six choices once, write them down, and your share of voice number is reproducible. Skip any of them and you are reporting a number that cannot be defended in review.
The formula for the per-engine view is the same with the prompt set fixed to that engine's responses only. The formula for the revenue-weighted blend is:
Blended SOV = sum over engines of (per-engine SOV × per-engine revenue share of AI traffic to you).
Revenue share here is your business's revenue from each engine as a fraction of total AI-attributed revenue. The Attrifast 200-site cohort blended weights (Perplexity, Claude, ChatGPT, Gemini, AIO) sit roughly at 18% / 12% / 53% / 9% / 8%, but every business has its own weights and using the cohort numbers blindly is wrong.
A worked example: three brands, 30 prompts, four weeks
The example: a B2B SaaS category with three named competitor brands (call them Brand A, Brand B, Brand C). The prompt set is 30 buyer-intent prompts split as 10 TOFU ("what is X"), 15 MOFU ("X vs Y," "best X for Y use case"), and 5 BOFU ("X pricing," "is X worth it"). Two engines tracked: ChatGPT and Perplexity. Window: 4 weeks. Each prompt run 7 times per week per engine. Competitor set explicitly bounded to A, B, C.
Total observations: 30 prompts × 7 runs × 4 weeks × 2 engines = 1,680 prompt-runs. Within each prompt-run, citations are parsed and matched against the three competitor domains. A single prompt-run might produce zero citations for any of A/B/C, or it might produce citations for one, two, or all three.
Raw citation counts (4-week window)
Engine
Brand A citations
Brand B citations
Brand C citations
Total in-set citations
ChatGPT
188
124
96
408
Perplexity
243
218
138
599
Combined
431
342
234
1,007
Per-engine share of voice (within the competitor set)
Engine
Brand A SOV
Brand B SOV
Brand C SOV
ChatGPT
46.1%
30.4%
23.5%
Perplexity
40.6%
36.4%
23.0%
Equal-weighted blend
43.3%
33.4%
23.3%
Reading the per-engine view: Brand A leads on ChatGPT by a wider margin than on Perplexity, where Brand B is closer behind. Brand C is roughly even-weak across engines. The variance between engines is meaningful — Brand A's lead on ChatGPT (46.1% vs 30.4%) is structurally different from its lead on Perplexity (40.6% vs 36.4%). An equal-weighted blend hides that.
Assume revenue weights for this category are 35% ChatGPT, 45% Perplexity, with the remaining 20% from Claude/Gemini/AIO (not tracked in the example). Re-normalize: ChatGPT 44%, Perplexity 56% of the two-engine universe.
Brand
ChatGPT SOV
Perplexity SOV
Revenue-weighted blend
Brand A
46.1%
40.6%
43.0%
Brand B
30.4%
36.4%
33.8%
Brand C
23.5%
23.0%
23.2%
The revenue-weighted blend favors Perplexity in this example because the category sends more revenue through Perplexity than ChatGPT. Brand B closes part of its gap to Brand A under revenue weighting because Brand B is stronger on the higher-paying engine. The strategic read: Brand B should keep doubling down on Perplexity and try to close the ChatGPT gap; Brand A should defend ChatGPT while investing in Perplexity.
Sensitivity analysis: what happens when the competitor set widens
Suppose you widen the competitor set to include Wikipedia, Reddit, and one large reference site that frequently appears in the citations. The total citation pool grows by another 1,150 citations across the window (these sources are cited heavily for definitional and "what do people think" prompts).
Brand
SOV (3-brand set)
SOV (3-brand + Wikipedia/Reddit/reference)
Brand A
43.3%
20.0%
Brand B
33.4%
15.8%
Brand C
23.3%
10.9%
Wikipedia
—
24.0%
Reddit
—
18.0%
Reference site
—
11.2%
That second column is a different metric. It measures total citation surface area, not competitive position within the named-brand competitor set. Both are valid, both should be tracked, but the headline "share of voice" number for an executive should be the bounded competitor-set version, with the total-citation-diversity version reported alongside.
The most common reporting error in this category is using the widened denominator to report a low share of voice number and then panicking about it. Brand A's 20% in the widened view is not worse than its 43% in the bounded view — they are different metrics measuring different things.
Citation share versus mention share
Citation share counts cases where your domain appears in the cited sources of an answer. Mention share counts cases where your brand name appears in the answer text whether or not it is linked. These are two different shares and should be reported separately.
The reason matters because engines behave differently on the two. Claude often mentions brands without citing them (Claude conversation mode without web search rarely returns citations at all; it just narrates with brand names). ChatGPT often cites without mentioning the brand name in the prose ("source: example.com" without writing "according to example.com"). A consolidated "visibility share" that adds the two produces a clean-looking number that obscures the difference.
Engine
Citation behavior
Mention behavior
Reporting recommendation
ChatGPT
High citation density, named sources
Variable mention density
Report both, focus on citation
Perplexity
High citation density, always links
High mention density
Report both, citation primary
Claude (chat mode)
Low to zero citations
High mention density
Mention is the headline metric
Claude (with web search)
Moderate citations
Moderate mentions
Report both
Gemini
Moderate citations
Moderate mentions
Report both
Google AI Overviews
Variable citations
Lower mentions
Citation primary
A complete dashboard reports both columns. Treating them as the same metric is the most common mistake I see in vendor-supplied dashboards.
Per-engine SOV is the primary view; blended is secondary
The temptation to roll five engines into a single "AI share of voice" number is strong because it produces a clean executive-ready chart. The temptation should be resisted unless the blend is weighted correctly.
The reason: engines differ in how much economic value they send. Claude conversation mode mentions your brand frequently but sends almost no clicks because there is no link to click. Perplexity links out aggressively and sends real referral sessions that convert. An equal-weighted blend treats those two as equivalent, which is structurally wrong. A revenue-weighted blend weights each engine by how much revenue it sends your business and produces a number that is at least directionally tied to dollars.
Blending method
When to use
Risk
Per-engine only (no blend)
When you have meaningful per-engine business decisions
Cluttered dashboard, no headline number
Equal-weighted blend
Rarely
Over-rewards engines that do not pay
Session-share weighted
When revenue data is unavailable
Better than equal, still over-weights low-RPV engines
Revenue-share weighted
When you have first-party Stripe join
Best, requires attribution infrastructure
Citation-volume weighted
Within an engine across prompts
Useful for prompt-prioritization
The revenue-weighted version is the one I run for the founders I help, but it requires a server-side attribution layer underneath that most teams do not have yet. The minimum honest version, when revenue data is not yet available, is to report per-engine SOV as the primary view and clearly label any blend as "session-weighted" or "equal-weighted" so the reader can interpret it correctly.
Common pitfalls in share of voice reporting
Five mistakes account for most of the bad share-of-voice numbers I have seen reported in 2025-2026. None of them is rocket science; all of them are easy to make.
Pitfall
What goes wrong
How to fix
Drifting prompt set
Trend lines reset whenever a prompt is added or removed
Lock the prompt set at the start of the window
Ambiguous competitor set
Number swings 10-20 points on definitional change
Write the competitor set down explicitly
Single-run snapshots
Engine non-determinism dominates the result
Sample at least 5-7 runs per prompt per window
Combining citations and mentions
Produces clean number that hides engine differences
Report both separately
Equal-weighted engine blend
Over-rewards low-pay engines
Use revenue or session weighting
A sixth and rarer mistake is reporting share of voice without re-baselining after a major model launch. GPT-5 and Claude 4.x both shifted citation patterns meaningfully across the cohort properties I tracked when they launched. A share of voice trend line that runs through a model launch without a baseline event is mixing two different metrics under one chart title.
Putting it all together: the standing report I run
The report format I run weekly for the founders I help has four sections.
Bar chart per engine for each competitor in the bounded set
Revenue-weighted blend
Single number per competitor, with the weights printed
Sensitivity + footnotes
Wider-set comparison + any baseline events in the window
That format takes 15 minutes to read and produces a defensible scorecard for an executive review. The pieces that produce the data are a prompt tracker for the upstream half (Profound, Peec, SEOcrawl, Otterly, or Loamly) and a server-side attribution layer for the revenue weights (Attrifast or equivalent). The methodology is what stitches them together.
The two articles that complete the trilogy are what is prompt tracking for the data-collection half and AI visibility metrics and KPIs for the broader KPI context this metric sits inside. For the per-engine breakdown of ChatGPT, Perplexity, Claude, and Gemini traffic that feeds the revenue weights, the per-engine landing pages walk the attribution side.
FAQ
What is share of voice in AI search?
Share of voice in AI search is your brand's citations as a share of all citations across a defined prompt set, over a defined window, on a defined engine set. It is the AI-search analogue to the classic share of voice metric, but it cannot be computed the classic way. In paid search, share of voice was impression share. In AI search there is no auction and no impression in the classic sense. The substitute is citation share: how often you appear among the cited sources when a defined prompt is asked. Across the 40 properties I have instrumented, share of voice is the single number that best captures competitive position, but only if the underlying prompt set is honest.
Why does the classic share of voice definition break for AI?
Three structural reasons. There is no equivalent of total impressions in AI search because every answer is generated per-query, not retrieved against fixed inventory. Brand mentions in AI answers can be unlinked, so your share is not just citation-share but mention-share too. AI engines are non-deterministic — the same prompt produces different citation sets across runs. The classic impression-share formula assumes a stable denominator and a deterministic event log, both absent. The replacement is sampling-based citation share with explicit mention treatment and competitor-set bounds.
What is the formula for AI share of voice?
Share of Voice (your brand) = sum of your citations across the prompt set in the window / sum of all citations across the same prompt set in the window, where citations is defined as your domain appearing in the cited-sources footnotes of an engine's answer. For mention-share, replace citations with brand-name occurrences in the answer text. For multi-engine, weight each engine by its session-share or revenue-share to your business. The base formula is simple. The hard part is the denominator.
How do you define the competitor set?
Explicitly, in writing, and with discipline. Define your competitor set as the 3-7 named brands a real buyer would consider, and compute share of voice within that bounded set. Wikipedia, Reddit, and large reference sites typically should not be in the set even if they show up in citations, because they do not compete for your customer. Track total citation diversity separately. The risk of ambiguity is that small definitional changes swing your share by 10-20 points without anything underlying changing.
How big should the prompt set be for share of voice to be reliable?
At least 30 distinct prompts covering the funnel, with each prompt run at least 5-10 times in the window. So a minimum of 150-300 prompt-run observations per engine before SOV numbers are trustworthy. Below 30 prompts the sampling error makes single-percentage-point shifts meaningless. Below 5 runs per prompt the variance from engine non-determinism dominates. The 50-200 prompts run weekly with 7 samples each is the sweet spot — large enough to denoise, small enough to manage.
Should share of voice be measured per engine or blended?
Both, but report per-engine as primary and blended as a weighted average. Per-engine tells you where you are strong and losing; blended tells you total competitive position. Blend should be weighted by something economically meaningful — session share by engine, or per-engine RPV if you have a Stripe join. Equal-weighted blends treat Claude (low click-through) the same as Perplexity (high click-through), which over-rewards engines that do not pay.
How is AI share of voice different from impression share in Google Ads?
Impression share is your ad shown / total eligible auctions, computed by Google against its own auction log. AI share of voice has no auction, no eligibility flag, and no platform-side log. It must be reconstructed by sampling. Impression share is deterministic given a bid; AI SOV is probabilistic given a prompt and engine state. The concept transfers; the data source does not. Anyone reporting AI SOV as if it were a platform-supplied number is overstating precision. Honest reports include prompt count, runs per prompt, and competitor set.
How do brand mentions versus citations factor into share of voice?
They are two different shares and should be reported separately. Citation share counts cases where your domain appears in cited sources. Mention share counts cases where your brand name appears in the answer text whether or not linked. Across engines, the two diverge: Claude often mentions without citing, ChatGPT often cites without naming the brand in prose. A consolidated visibility share that adds them is misleadingly clean. Track both. Citation share is closer to the click funnel; mention share is closer to brand association.
How do you handle ties when multiple sources are cited equally?
Most engines do not rank citations within an answer beyond display order, and display order does not always correspond to weight. The cleanest practice is to treat all sources equally — each gets one count regardless of position. For a position-weighted variant, weight by inverse rank (1.0, 0.5, 0.33) but report it as a separate position-weighted share, not the headline. Engines sometimes return zero citations and only a mention; those count toward mention share but not citation share.
How often does AI share of voice need re-baselining?
After every major model launch (GPT-5, Claude 4.x, Gemini 3.x reshuffled citation patterns), after any prompt-set change above 10% addition or removal, and quarterly even without an obvious trigger. Engines push retrieval updates without notice; a January baseline will not be comparable to a May baseline. The discipline is to fork the dataset at each baseline event and report "share of voice (Q2 2026 baseline)" rather than running one time series through reshuffles.
What is the worked example in this article based on?
A simplified but realistic three-brand category, 30 buyer-intent prompts, weekly runs across ChatGPT and Perplexity for four weeks, with citation parsing and the formula applied row by row. Brand names are illustrative but the cohort-level distribution mirrors what I see across the 200-site Attrifast benchmark for B2B SaaS with three to five visible competitors. The example produces per-engine and blended SOV, with a sensitivity analysis showing how the number shifts when the competitor set widens.
Can Attrifast measure share of voice?
Attrifast does not run prompt tracking, so it does not measure citation share or share of voice directly. It handles the downstream half: server-side AI traffic attribution and Stripe-joined revenue per engine. The proper stack is a prompt tracker (Profound, Peec, SEOcrawl, Otterly, or Loamly) for the SOV numerator and denominator plus Attrifast for the revenue-weighting on the blended-across-engines version. Joining the two layers lets you compute revenue-weighted SOV, the most decision-useful version of the metric.
Related reading
This article is the methodology walkthrough — the formula, the worked example, the standing report. For the conceptual layer covering why classic SOV is a vanity number and how Revenue Share of Voice fixes it (the metric that adds Stripe-joined revenue weighting to the mention count), see AI Share of Voice in 2026: How to Measure It, and Why Revenue Share of Voice Matters More.