A step-by-step method to analyze why ChatGPT, Perplexity, Claude and Gemini recommend your competitors over you — build a buying-query prompt set, tally per-competitor share of voice, teardown their citation sources, then close the gaps that actually drive your revenue.
A founder emailed me in April with a screenshot. He had asked ChatGPT "what's the best privacy-first analytics tool for a small SaaS," and the answer named three competitors and not him. His message was two words: "fix this." My reply was a question: "fix what, exactly?" Because "ChatGPT recommends my competitor" is not a diagnosis. It is a symptom, and the symptom has a specific, findable cause — usually a source the competitor sits in that you do not. Until you know which source, "fix this" has no target.
This is the offensive, competitive-intelligence companion to two pieces I have already written. ChatGPT isn't recommending your product is the inward-facing diagnostic — the eight reasons the model ignores you. AI share of voice is the metrics piece — what the resulting mention-count number means and why revenue share of voice matters more. This article is the third leg: how to look outward, measure why the engines recommend your competitors, reverse-engineer the structural reason, and decide which gap to close based on whether it moves your revenue. It is a method, not a pep talk. By the end you will have run a real audit on your own category and have a ranked list of gaps with a fast/slow tag and a revenue read on each.
A note on honesty up front, because this topic attracts snake oil. There is no button that drops your competitor from ChatGPT's answer, no algorithm to reverse-engineer, and no guarantee that out-citing a rival pays. What there is: a repeatable way to see the gap precisely, a way to find the source that causes it, and a way to measure whether closing it shipped you money. That is the whole article.
Quick Facts
Metric
Value
Source
ChatGPT weekly active users (Q4 2025)
~400 million
OpenAI investor update [1]
Brands named per ChatGPT "best of" answer (typical)
3-5
OpenAI search docs [2]
Brands cited per Perplexity answer (typical)
4-7
Perplexity docs [3]
Brands cited per Claude web-search answer (typical)
1-3
Anthropic docs [4]
Minimum prompts for a stable per-competitor SOV ranking
~30 per engine
Attrifast methodology
Recommended samples per prompt
3-5
Stochastic-sampling correction
Reddit's share of cited domains in AI answers (2025)
Among the most-cited domains
Search Engine Land / citation studies [5]
Reddit content-licensing deal that boosted its AI presence
Google ~$60M/yr; OpenAI partnership
Reuters [6]
GEO methods lifting citation share in controlled tests
Up to ~40% visibility lift
Princeton GEO paper [7]
GA4 default attribution for ChatGPT clicks
Direct/(none); no AI rule
Google Analytics docs [8]
Trackers that compute competitor share of voice
Profound, Otterly, Peec, SE Ranking
Vendor docs [9][10][11]
Trackers that join competitor SOV to your Stripe revenue
0 of the major ones
Vendor docs
ChatGPT RPV vs Google organic (B2B SaaS)
1.4-2.1x, n=24
Attrifast aggregate, Q1 2026
Two of those rows frame the whole piece. The Princeton GEO finding [7] — that specific structural moves lift citation share by up to ~40% in controlled tests — is why competitive gaps are closable at all: citation presence is engineered, not innate. And the "0 trackers that join competitor SOV to your Stripe revenue" row is the gap this article keeps circling back to. The trackers can tell you a competitor out-cites you. None of them can tell you whether closing that gap put money in your account.
What competitive AI visibility analysis actually is
Competitive AI visibility analysis is measuring how often your competitors appear inside AI-generated answers, reverse-engineering the sources that feed those citations, and isolating the structural gaps that — if closed — would win you revenue, not just mentions. It is the AI-era version of the competitor SEO audit, but the unit of analysis moved from "who ranks for the keyword" to "who gets named in the synthesized answer," and the answer names three to five brands, not ten.
It has three distinct layers, and the value rises sharply as you go down them. Most teams do the first and stop, which is exactly why most competitive AI analysis is useless.
Layer
Question it answers
Output
Most teams
1. Measurement
Who gets cited, how often, on which engine, for which prompts?
Per-competitor share-of-voice table
Do this and stop
2. Forensics
For a cited competitor, what source feeds the citation?
Citation-source teardown per competitor
Skip
3. Wedge
Which gaps, if closed, would move my revenue?
Ranked, revenue-weighted gap list
Almost nobody does
Layer one tells you the score. It feels like progress — you now have a chart showing your competitor at 38% and you at 11%. But a score with no cause is not actionable; you cannot "fix 11%." Layer two finds the cause: the competitor is at 38% because they appear in seven listicles, have a Wikipedia entity, and own three Reddit threads, while you have two listicles and no entity. Now you have targets. Layer three is the one that separates useful analysis from busywork: of those targets, which one is realistic to close, feeds a prompt that actually converts, and would plausibly move your revenue? Closing a gap that wins you a definitional citation nobody clicks is motion without progress.
The discipline of this article is to push all the way to layer three. A competitor out-citing you is a fact. Whether it costs you money — and whether closing it earns you money — is the only question that survives a board meeting. I detail the metrics half of this argument in AI share of voice vs revenue share of voice; here I am applying it to the competitive case specifically.
Why ChatGPT recommends your competitor: the mechanic
ChatGPT recommends your competitor because the competitor sits in more of the third-party sources the model grounds its answer in — the model recommends what the web has already recommended, and your competitor has been recommended more. It is rarely about product quality, which the model cannot assess directly. It is about the density and authority of the citation graph around each brand.
When a user asks for a recommendation, the engine is doing one of two things. Without browsing it recalls from its training corpus — a frozen snapshot weighted toward brands mentioned often and authoritatively before the cutoff. With browsing it runs a live search, retrieves a handful of pages, and synthesizes an answer with inline citations. Your competitor can win in either mode, and the source of their advantage differs:
Mode
Why the competitor wins
What feeds it
Your lever speed
Corpus (no browsing)
Mentioned more, more authoritatively, before the cutoff
Accumulated press, Wikipedia, Reddit, listicles
Slow (next retrain)
Retrieval (browsing on)
More crawlable, better-structured, higher-ranked pages now
Live listicles, comparison pages, organic rank, schema
Fast (days to weeks)
This matters for your analysis because you must test both modes per competitor or you will misread the gap. A competitor who beats you only in corpus mode has an accumulated-authority advantage you close slowly; one who beats you in retrieval mode has a structural-content advantage you can often match this quarter. I unpack the corpus-versus-retrieval split in detail in ChatGPT isn't recommending your product — for the competitive case, the key is that the two modes point you at two different sets of gaps with two different clocks.
One more mechanic that prevents the most common analysis error: a mention is not a citation is not a click is not a sale. Your competitor being named in an answer (a mention) is different from their URL being cited (a clickable path), which is different from a user clicking it, which is different from that click converting. Keep these four states separate or your competitive read will blur exactly where it matters.
State
What it means for a competitor
How you detect it
Mentioned
Model named the brand, no link
Read the answer text
Cited
Their URL appears as a footnote
Read the answer's sources
Clicked
A user followed the citation to them
You cannot see a competitor's clicks
Converted
That click paid them
You cannot see a competitor's revenue
The last two rows are why competitive analysis has a hard ceiling: you can measure a competitor's presence precisely, but you can never see their revenue. You can only measure the revenue impact on your own side after you close a gap. That asymmetry shapes the entire method.
The step-by-step competitive AI-visibility audit
Here is the full audit, the one I run for my own properties and walk customers through. It is seven steps, roughly two to three hours of manual work for a 30-prompt set across four engines, or an afternoon with a prompt-replay tool. Do it manually the first time even if you will automate later, because the manual pass teaches you what the automated dashboards are abstracting away.
Step 1: define the competitive set
Write down the explicit list of brands you compete with for the prompts you care about. Five to twelve is typical. This is your denominator, and it silently determines every share-of-voice number downstream. Include the brands you actually lose deals to and the ones that show up when you ask the engines your category question — not just the ones you think about. If you omit a strong competitor, your relative position looks better than it is; if you pad the list with brands nobody considers, the numbers get diluted.
Competitive-set sizing
What happens to the analysis
2-3 brands
Share swings wildly; treat as directional only
4-8 brands
Stable, the usual sweet spot
9-15 brands
Realistic for crowded categories; shares run smaller
16+ brands
Denominator so large signal washes out; segment the category instead
A useful trick: run one "who competes with [your category leader]" prompt across the engines before you finalize the set. The brands the model itself clusters together are the brands it will choose between in recommendation answers — that cluster is your real competitive set in the model's eyes, which is sometimes broader or narrower than your sales team's list.
Step 2: build the buying-query prompt set
This is where most competitive audits quietly break. Keyword tools return Google-shaped phrases ("best CRM small business"); AI users type conversational, buying-stage questions ("what's a good CRM for a 5-person team that's already on Stripe"). Build the set from the queries a buyer actually asks an AI when they are choosing, blended from three sources:
Prompt source
Contributes
Share of set
Keyword research (Ahrefs, Semrush)
Breadth of category topics
35%
Conversational question expansion
Real AI phrasing
30%
Observed buyer language (sales calls, support, Reddit)
Highest-intent prompts
35%
Then weight the set itself toward the buying-stage intent classes, because those are the answers that produce buyers. A 30-prompt audit set I would defend:
Intent class
Example
Count
Commercial value
Best-of / category
"best [category] tools for SMB SaaS"
8
Very high
Alternatives
"[incumbent] alternatives"
6
Highest — your query
Versus
"[you] vs [competitor]"
4
Highest
For-segment
"[category] tool for [niche use case]"
6
High — winnable
Recommendation
"what should I use to [job to be done]"
4
High
Definitional
"what is [category]"
2
Low — sanity check only
Notice definitional prompts are capped at two. They are easy to win and rarely convert; their only job here is to confirm the engines know your category exists. The audit's weight goes where the buyers are. Tag each prompt with its intent class now — you will need the tag in step seven to separate gaps that feed converting prompts from gaps that feed vanity ones.
Step 3: run, sample, record
Run each prompt against each engine, browsing on, three to five times, because the engines sample stochastically and a single run is a roll of the dice, not a reading. For the corpus check, also run your best-of and versus prompts once with browsing off to see who the model recommends from memory alone. Record per the schema below — a spreadsheet is fine for 30 prompts.
Field
Type
Used for
prompt
text
Grouping
intent_class
enum
Gap-to-revenue mapping
engine
enum
Per-engine SOV
browsing
bool
Corpus vs retrieval split
run_index
int
Stochastic dedup
brands_named
text[]
Mention SOV numerator
brands_cited_url
text[]
Citation SOV numerator
cited_urls
text[]
The teardown raw material
your_position
int
Where you placed, if at all
The cited_urls column is the most valuable thing you will collect and the one manual auditors skip because it is tedious. Those URLs are the raw material for the entire layer-two teardown. Do not just record that a competitor was cited — record what page was cited. The difference between "Competitor X appeared" and "Competitor X appeared via a G2 listicle and a Reddit thread" is the difference between a score and a strategy.
Step 4 and 5: tally per-competitor share of voice
For each engine, count how many answer slots each brand won (named, or cited with a URL — track both), divide by the total brand slots, and you have per-competitor share of voice. Then weight a blended number by where your buyers actually are, not by the engines' aggregate user counts.
SOV variant
Numerator
Tells you
Mention SOV
Times the brand was named
Awareness footprint
Citation SOV
Times the brand's URL was cited
Click-capable visibility
Position-weighted SOV
Citations weighted by their position in the source list
Approximate click share
Win rate
Prompts where the brand placed first
Recommendation dominance
A worked tally from a real audit (anonymized category, my own 30-prompt run across four engines, four runs each):
Brand
ChatGPT SOV
Perplexity SOV
Claude SOV
Gemini SOV
Blended
Win rate
Competitor A (the winner)
41%
37%
22%
33%
35%
19/30
Competitor B
22%
29%
11%
24%
23%
6/30
Competitor C
18%
14%
8%
19%
15%
3/30
You
9%
16%
4%
11%
11%
2/30
Competitor D
6%
4%
2%
8%
5%
0/30
Two things jump out before any forensics. First, Competitor A wins 19 of 30 prompts — they are the dominant recommendation, not just a frequent mention. Second, your own best engine is Perplexity (16%) and your worst is Claude (4%), which already tells you Perplexity is where a gap is most closable and Claude is where the corpus barrier is highest. The blended single number (11%) would have hidden both facts. This is exactly the per-engine divergence I argue against flattening in the share of voice piece.
The competitor citation-source teardown
This is layer two, and it is where competitive AI analysis stops being a scoreboard and becomes a plan. A citation-source teardown takes one cited competitor and identifies which source class is actually feeding their citations — their own pages, third-party listicles, Reddit, Wikipedia, G2, or news — so you know the one structural thing to match rather than guessing. The raw material is the cited_urls column you collected in step three plus a few targeted searches.
Six source classes account for nearly all AI citations. For each, here is what it is, how to find whether a competitor has it, and roughly how much weight it carries:
Source class
What it is
How to find it for a competitor
Citation weight
Own structured pages
Their comparison / best-of / docs pages the model lifts
Read the cited URLs; note their own-domain pages
Medium-high (retrieval)
Third-party listicles
"Best [category] tools" roundups naming them
Google the buying queries; read top-10 listicles for their name
Very high
Reddit / forums
Recommendation threads naming them
Search "Competitor" reddit and best [category] reddit
High (over-weighted)
Wikipedia / Wikidata
Their entity card
Search Wikipedia and Wikidata for the brand
Very high (over-weighted)
G2 / Capterra
Review profile and review count
Visit their G2 / Capterra page; note review count
Medium
News / press
TechCrunch, trade press, funding coverage
"Competitor" -site:competitor.com news search
Medium-high
Run that against your winning competitor and you get a teardown like this — the actual shape from the Competitor A above:
Source class
Competitor A has it?
Evidence
You have it?
Own structured pages
Yes — 6 comparison/alternatives pages
Cited 11x across the audit
Partial — 1 page
Third-party listicles
Yes — named in 7 of top-10 roundups
Cited 14x
Named in 2
Reddit / forums
Yes — recommended in 4 threads
Cited 5x
Named in 0
Wikipedia / Wikidata
Yes — full Wikipedia entity
Grounds corpus-mode recall
No entity at all
G2 / Capterra
Yes — 340 reviews
Cited in G2's own listicle
12 reviews
News / press
Yes — funding + product coverage
Several cited
Minimal
Now the analysis has teeth. Competitor A is not winning because of one thing; they are winning because they sit in every source class while you sit in two. But — and this is the layer-three move — you do not have to match all six. You have to find the subset that is realistic to close and feeds the prompts that convert.
A useful framing is to compute, per source class, how many of the competitor's citations it accounts for, because that ranks the sources by leverage rather than by your gut. In the teardown above, third-party listicles (14 citations) and own comparison pages (11) are doing the heavy lifting; Reddit (5) and news matter less to the live-retrieval answers, even though Reddit punches above its weight in the corpus. That ranking tells you where to aim first: the listicles and your own comparison content, both of which are faster to influence than a Wikipedia entity or 300 G2 reviews.
A worked example: 30 "best [category]" prompts, mapped
Let me make the whole method concrete with the run I keep referencing, because an abstract method is easy to nod along to and hard to actually execute. I ran 30 buying-stage prompts in my own category — privacy-first / cookieless analytics and attribution for small SaaS — across ChatGPT, Perplexity, Claude and Gemini, four runs each, browsing on, over a single week in early May 2026. Aggregate patterns are real; I have anonymized the specific competitor names to A through D.
The headline tally was the table two sections up: Competitor A won 19 of 30 prompts at 35% blended SOV; I won 2 at 11%. The interesting part was not the score. It was mapping which competitor won which prompt, because the pattern was not uniform.
Prompt cluster (count)
Winner
Why (from the teardown)
Best-of / category (8)
A in 7, B in 1
A's listicle and G2 dominance; pure authority play
Alternatives (6)
Split: A 2, B 2, you 2
Lower barrier; whoever published the comparison page won
Versus (4)
Whoever wrote the page (you 1, A 2, B 1)
The "[X] vs [Y]" page author gets cited for their own comparison
For-segment (6)
Mixed; you 1, C 2, A 2, B 1
Segment specificity beat raw authority
Recommendation (4)
A 3, B 1
Defaulted to the most-cited brand
Definitional (2)
A 2
Wikipedia entity carried the corpus answer
Reading that table changed my entire plan. Competitor A owned the high-authority prompts (best-of, recommendation, definitional) through accumulated structural advantages I could not replicate quickly — Wikipedia entity, 340 G2 reviews, seven listicle placements. Chasing those was a quarters-long money pit. But the alternatives and versus and for-segment clusters were a different story: those were won by whoever published the relevant comparison page, and the barrier was low. Those 16 prompts were winnable this quarter with content I controlled.
Then I did the thing the title promises: I found the three things the winner had that the losers (including me) did not.
What Competitor A had
What the losers lacked
Closability
1. A Wikipedia + Wikidata entity grounding corpus-mode recall
No entity; model unsure they were a distinct, notable brand
Wikidata fast, Wikipedia slow
2. Comparison pages for every major rival ("A vs B," "A vs C," "B alternatives")
One or zero comparison pages each
Fast — publish in days
3. Presence in 7 of the top-10 third-party listicles
Presence in 0-2 listicles
Medium — outreach cycle
The single most decisive one was the second. Competitor A had a comparison page targeting every alternatives and versus query in the category, so they got cited for their own honest comparison even when a rival was the subject of the query. That is the cheapest, fastest, most replicable advantage on the list — and it directly feeds the highest-intent prompts. So that is where I aimed first. Not at the Wikipedia entity (slow), not at the listicles (medium), but at publishing the comparison pages that win the converting prompts, and only then working the slower entity and listicle gaps in parallel.
The number that mattered most was not the SOV. It was this: the alternatives and versus clusters, where the gap was closable, were also the clusters with the highest commercial intent. The gaps I could close were the gaps worth closing. That alignment does not always hold — sometimes the closable gap feeds a worthless prompt — which is precisely why you map prompt-cluster to winner to closability before you do anything.
The gap-closing playbook
Once the teardown names your gaps, each one has a known fix and a known speed. Here is the playbook table — for each gap class, the fix and how fast it can realistically land. The fast/slow split tracks the corpus-versus-retrieval mechanic: anything that flows through live retrieval moves in days, anything that depends on a model retrain or earned authority moves in quarters.
Gap
Fix
Speed
You control timing?
No Wikipedia entity
Earn it via 5+ independent reliable-source citations; do not pay for it
Fill out the comparison-page matrix for every rival and segment
Comparison content
Weeks 2-8
Outreach for listicle placements; systematic review requests
Listicles, G2/Capterra
Ongoing
Organic community presence, press, organic rank
Reddit, news, SEO
Quarterly
Re-test browsing-off recall and pursue Wikipedia once notable
Wikipedia, corpus
The honest caveat that belongs on this playbook: closing a gap moves your citation presence, which is necessary but not sufficient for revenue. A comparison page that wins you a citation slot only pays if the click lands somewhere that converts. That is why the playbook ends not at "gap closed" but at the measurement step — the last section of this article. Do not treat a closed gap as a win until the revenue line confirms it.
There is one gap class worth singling out because it is uniquely high-leverage and uniquely fast: the comparison-page matrix. If a competitor out-cites you because they have a page for every "[rival] alternatives" and "[X] vs [Y]" query and you have one or none, you can close most of that gap in a week of honest writing. The catch is the word honest: a self-serving comparison that ranks you first on every axis fools no one, the model cross-references, and it gets discounted. The comparison content that earns citations is genuinely useful about where competitors win — which is also just better content. I cover the citation-earning structure in depth in how to get cited by AI engines.
Where each source comes from, and how to match it
The gap-closing playbook is the what-to-do. This section is the deeper why behind the two source classes that punch hardest above their weight — Reddit and Wikipedia — because matching them is both the highest-leverage and the most-misunderstood part of competitive GEO.
Reddit is among the most-cited domains in AI answers, and the reason is structural, not coincidental. The major engines have content-licensing and access arrangements with Reddit — Google's reported ~$60M/year deal and OpenAI's partnership being the prominent ones [6] — and beyond the licensing, community recommendation threads are exactly the answer shape the model wants: real users naming real tools for real use cases, with upvote signal standing in for consensus. If your competitor is recommended in four subreddit threads and you are in none, the model has a high-trust, independent source naming them and nothing equivalent for you. I go deep on the revenue side of this in Reddit AI citations and revenue.
The way to match it is the slow, unspoofable way: genuine participation, useful answers, letting satisfied customers mention you organically. The fast version — sockpuppet threads, paid recommendations — is detectable by both the community and increasingly the models, and a public astroturfing bust is a durable brand liability that outlasts any short-term gain. There is no safe shortcut here, which is precisely why Reddit presence is such a strong signal: it is hard to fake.
Wikipedia and Wikidata are disproportionately present in both training corpora and live retrieval, and they function as the entity card the model uses to decide you are a real, distinct, notable thing. A competitor with a Wikipedia entity has a canonical, authoritative source the model trusts for who they are; a brand without one leaves the model unsure, and uncertainty suppresses recommendation. This is why the Wikipedia gap shows up most starkly in corpus-mode (browsing-off) recall — it is the source the model leans on when recalling from memory.
The way to match it splits in two. Wikidata you can create yourself today, legitimately, at a far lower bar than Wikipedia — do that immediately, and build a clean sameAs graph alongside it, which is an afternoon of work and pure upside. Wikipedia you must earn through genuine notability (5+ independent reliable-source citations) and you must never pay for, because paid creation violates the platform's rules, gets flagged and deleted, and a deleted article is worse than none. The full mechanism is in the Wikipedia effect on AI visibility.
Tools: manual prompt-testing vs Profound, Otterly, Peec
You can run the entire audit above by hand, and you should the first time. For ongoing competitive monitoring you will want a prompt-replay tool that automates the runs, parses the answers, and tracks competitor share of voice over time. Here is the honest landscape, with the standard caveat that I build a tool in an adjacent category, so I will be explicit about where mine does and does not belong.
Manual prompt-testing is free and teaches you the most, but it does not scale past about 30 prompts and one engine before the labor gets painful. Roughly 30 seconds per query means 30 prompts times four engines times four runs is about three hours per pass, and you are doing it manually every week. It is the right starting point and the wrong steady state.
Daily-check AI monitoring, SOV + sentiment, competitor tracking
ChatGPT, Perplexity, AIO
$29+/mo
No
Peec AI
Competitor-focused AI visibility tracking, share-of-voice over time
ChatGPT, Perplexity, Gemini
~$120+/mo
No
SE Ranking AI Tracker
SERP-adjacent AI visibility, competitor visibility score
Gemini, AIO, ChatGPT
$52+/mo add-on
No
Attrifast
First-party AI-engine revenue attribution (the wedge layer)
All, via referral + Stripe
$29/mo
Yes (your side only)
The categorical split is the thing to internalize, because vendor demos blur it constantly:
Job to be done
Tool category
Example
"Who out-cites me, on which engine, for which prompts?"
Citation / prompt tracker
Profound, Otterly, Peec, SE Ranking
"What source feeds my competitor's citations?"
The teardown (mostly manual)
Any tracker + your reading of cited URLs
"Did closing the gap drive MY revenue?"
First-party revenue attribution
Attrifast
The trackers are good at layer one. Profound is the most thorough for enterprise; Otterly and Peec are the SMB-friendly options with honest competitor SOV at a fraction of the price; SE Ranking is sensible if you already pay for it. I am not knocking any of them — they solve the measurement problem cleanly. The point is narrower and it is the recurring theme of this article: every one of these tools stops at the competitor's citation count. None of them join that count to your Stripe revenue, because they scrape AI answers from the outside and have no view into your billing system. They can tell you a competitor out-cites you 35% to 11%. They cannot tell you whether closing that gap put a dollar in your account.
That is the layer I built Attrifast for, and I want to be precise about the boundary: Attrifast does not replace a citation tracker. It does not compute a competitor's share of voice — it cannot, because it does not scrape AI answers. What it does is the half the trackers structurally cannot reach: detect AI-sourced sessions arriving on your site server-side, persist them first-party without a cookie or consent banner, and join each to revenue on the Stripe webhook. You bring the competitor SOV from your tracker; Attrifast tells you whether the prompts you won back actually converted. The two together close the loop. Neither alone does.
The revenue wedge: is closing the gap worth it?
This is layer three, and it is the part that separates competitive AI analysis that ships money from competitive AI analysis that ships charts. Knowing a competitor out-cites you is half the battle; the other half is whether winning the slot back drives your revenue — and the only way to know is to measure cited-to-clicked-to-paid for each query you win. A competitor losing a citation slot to you only matters if the users who would have clicked them now click you, land on a page that fits, and pay.
The trap is declaring victory at the share-of-voice line. You close a gap, your tracker shows your SOV climbing from 11% to 18%, and the temptation is to call it a win. But SOV is an input, not an outcome. The link from SOV to revenue is severed by everything between a citation and a sale — citation position, click-through, landing-page fit, buyer intent — none of which the SOV number can see. I make this argument in full in the share of voice piece; here is the competitive-specific version.
The per-query measurement that closes the loop:
Step
What you measure
How
1. Win the slot
Your SOV rises for the prompt you targeted
Tracker, before vs after
2. Cited
Your URL now appears in the answer
Read the answer's sources
3. Clicked
AI-attributed sessions to that landing page rise
First-party AI-source detection
4. Converted
Those sessions paid
Stripe webhook join
5. Worth it
Revenue won back > cost of closing the gap
The actual decision
The reason step three and four require special machinery is the one I document at length in the ChatGPT referral analytics guide: ChatGPT strips the referer, so the click lands in GA4's Direct/(none) bucket, and you cannot see the AI-attributed session at all in default analytics. You need first-party server-side detection plus a Stripe join — detect the AI source at the edge, persist a first-party session row, join it to revenue on checkout.session.completed. No third-party cookie, no consent banner, no dependence on the engine passing a referer. That is the only way the cited-to-clicked-to-paid chain becomes visible.
A worked revenue read on the alternatives cluster I targeted in the worked example. After publishing the comparison pages, I tracked the four "[competitor] alternatives" prompts I was newly winning:
Prompt won back
SOV before → after
AI sessions/mo to page
Converted
Revenue/mo
"Competitor A alternatives"
0% → 22%
31
2
$58
"Competitor B alternatives"
0% → 18%
19
1
$29
"Competitor C alternatives"
0% → 14%
11
0
$0
"cheaper [category] tool than [A]"
0% → 27%
24
2
$58
The read is honest and useful. Three of the four pages won the slot and shipped revenue; the "Competitor C alternatives" page won the slot, drew clicks, and converted nobody — which told me either the page did not fit those buyers or Competitor C's users are not in-market for a switch. The cost of all four pages was a week of writing. The revenue was modest in absolute terms (first-year SMB SaaS, small numbers) but the structure is the point: I now know which gap classes pay and which do not, and I can reweight the next quarter's effort toward the ones that do. Without the Stripe join, all four pages would have looked identically "successful" on the SOV chart, and I would have happily written more Competitor-C-style pages that win citations and convert no one.
The general principle, stated cleanly: a competitive gap is worth closing only if the prompt it feeds converts for you specifically. The same closed gap can be a win for one brand and a waste for another, depending on landing-page fit and buyer intent. Share of voice cannot tell the two apart. Cited-to-clicked-to-paid can, and it is the only number that survives the question "did beating them on this query make us money."
Common mistakes in competitive AI analysis
Eight failure modes I see often enough to name, each with the correction.
Mistake 1: Stopping at "they're cited more." The most common one, and the whole reason this article exists. Knowing a competitor out-cites you is a score, not a plan. Fix: always run the teardown to find the source of their advantage, then map it to revenue.
Mistake 2: Testing one engine. A competitor can dominate ChatGPT and be beatable on Perplexity for the identical queries. Fix: run all four engines and read the per-engine spread — the engine where your gap is most closable is often not the one with the most users.
Mistake 3: Testing once. A single run is a roll of the dice given the engines' stochastic sampling. Fix: 3-5 runs per prompt, averaged or unioned.
Mistake 4: A definitional-heavy prompt set. Stuffing the set with "what is [category]" prompts inflates everyone's apparent presence and measures nothing about buying behavior. Fix: weight toward best-of, alternatives, versus, and for-segment prompts where buyers actually decide.
Mistake 5: Trying to match the incumbent's entire footprint. A category leader's Wikipedia entity, 2,000 reviews, and decade of press cannot be replicated this year, and chasing all of it is a money pit. Fix: isolate the one or two closable gaps that feed converting prompts.
Mistake 6: Going head-on in "best [category] tools." Where a category-defining incumbent is strongest, displacement through GEO alone usually is not winnable. Fix: own the alternatives, versus, and segment queries where the incumbent's dominance is weaker and intent is higher.
Mistake 7: Astroturfing the fast sources. Paying for Reddit recommendations or fake reviews to match a competitor's community presence is detectable and self-defeating. Fix: earn the slow sources genuinely; their difficulty is exactly what makes them strong signals.
Mistake 8: Measuring the win in share of voice, not revenue. The biggest one. You close a gap, SOV rises, you declare victory — but GA4 buckets the AI clicks as Direct and you never see whether they paid. Fix: measure cited-to-clicked-to-paid with first-party attribution joined to Stripe, via the revenue attribution feature and track ChatGPT traffic.
Mistake
Symptom
Correction
Stop at "they're cited more"
A score with no action
Run the source teardown
One engine
Misread the gap
Test all four, read the spread
One run
False volatility
3-5 runs per prompt
Definitional-heavy set
Meaningless presence numbers
Weight toward buying prompts
Match entire footprint
Quarters-long money pit
Isolate closable, revenue-relevant gaps
Head-on with incumbent
Wasted effort
Own alternatives + segment queries
Astroturf fast sources
Detection + brand liability
Earn slow sources genuinely
Measure in SOV not revenue
Vanity wins
Cited→clicked→paid via Stripe join
How competitive GEO analysis fits the broader picture
Competitive AI analysis is one input in a stack, and it is most powerful next to its neighbors. Here is where it sits relative to the inward-facing and metrics work.
The diagnostic piece and this competitive piece are two sides of one coin: the diagnostic asks why you are absent, the competitive asks why your rival is present, and the answers overlap heavily — both come down to source-graph density. The difference in method is the direction you point the analysis. Run the diagnostic on yourself, run the competitive analysis on your category, and the union of the two gives you a complete map of where you stand and exactly which gaps to close.
The thread tying all of them together is the revenue layer. Every one of these analyses produces a presence number — a citation, a mention, a share of voice — and presence is an input, not an outcome. The piece that converts presence into a defensible business decision is the first-party-to-Stripe join, which is the layer the citation trackers structurally cannot ship. That is the consistent argument across this whole series, and it is the one I staked a product on.
What this looks like inside Attrifast
A short, honest note on the product, because the article should not pretend the author is disinterested. Attrifast does not run prompt-replay audits, does not compute your competitor's share of voice, and does not scrape AI answers — those are jobs for Profound, Otterly, Peec, and the manual method above, and I will happily point you to them. What Attrifast does is the wedge layer this whole article builds toward: it detects AI-sourced sessions server-side (ChatGPT, Perplexity, Claude, Gemini, Copilot), persists them first-party without a cookie or consent banner, and joins each session to revenue on the Stripe checkout.session.completed webhook.
The practical value for competitive analysis specifically is step three through five of the revenue wedge. You close a gap against a competitor, your tracker shows your SOV rising for the target prompt, and then Attrifast tells you the part the tracker cannot: did AI-attributed sessions to that landing page actually rise, and did they convert in Stripe? That turns "we won the citation slot" into "we won the citation slot and it shipped $X," which is the only version of the sentence that survives a budget review. Cost is $29/mo, the tracking script is 4 KB and cookieless, and the Stripe connection is OAuth, not an API key. The detection mechanic is documented end to end on the track ChatGPT traffic page.
The first-person reason I built it: I was the founder who could see a competitor out-citing me, did the work to close the gap, watched my Direct bucket climb, and had no idea whether any of it converted — because GA4 hid the AI clicks in Direct and I had no revenue signal per query. The product is the signal I wished I had when I was guessing.
Limitations
Five things this article does not claim, so you do not over-extrapolate.
You can never see a competitor's revenue. The entire method measures a competitor's presence precisely and infers their advantage, but the cited-to-clicked-to-paid chain is only observable on your own side after you close a gap. Treat competitor "wins" as presence wins, not revenue wins.
Share of voice is an input, not an outcome. Out-citing a competitor 18% to 11% is a leading indicator at best. The only outcome metric is your AI-attributed revenue, which requires the first-party-to-Stripe join GA4 cannot produce.
The corpus-versus-retrieval split is a useful model, not the literal internal architecture. The engines do not publish how recall, recommendation, and live retrieval interact. The two-mode framing holds up in testing but is an approximation, not a leaked spec.
GEO does not let you displace a category-defining incumbent head-on. Where a brand owns the category in the corpus, the realistic win is to make the list and own the long-tail and alternatives queries, not to dethrone them in "best [category] tools." Anyone promising head-on displacement is overselling.
The numbers are aggregates and snapshots. The per-competitor SOV figures, the worked revenue read, and the 1.4-2.1x RPV multiplier are from a specific set of mostly-SaaS properties in early-to-mid 2026 and will drift as the engines' user bases broaden. Treat them as directional, and measure your own.
FAQ
How do I find out which competitors ChatGPT recommends over me?
Build a prompt set of 30-100 buying-stage queries your category actually gets asked — "best [category] tools," "[incumbent] alternatives," "[category] for [segment]" — and run each one 3-5 times across ChatGPT, Perplexity, Claude and Gemini with browsing on. Record every brand named and every URL cited per answer, then tally how many of the answer slots each competitor wins. The competitor who appears in the most answers, in the highest positions, with the most cited supporting URLs is the one out-citing you. The whole audit is about 30 prompts times four engines times four runs, which is roughly two to three hours of manual work or an afternoon with a prompt-replay tool. The output is a per-competitor share-of-voice table, not a vibe.
Why does ChatGPT recommend my competitor instead of me?
Almost always because the competitor sits in more of the third-party sources the model grounds its answer in, not because their product is better. The model recommends what the web has already recommended. If your competitor appears in eight "best [category]" listicles, has a Wikipedia entity, gets named in Reddit recommendation threads, holds 200+ G2 reviews, and ranks in the organic top 5 for the buying query — and you appear in two listicles with no Wikipedia entity and a thin review profile — the model has far more corroborating signal to recommend them. The useful analysis is not "they win"; it is identifying the specific structural source they have and you do not, then deciding whether closing that one gap is worth the revenue it would win back.
What is competitive GEO analysis?
Competitive GEO (generative engine optimization) analysis is the practice of measuring how visible your competitors are inside AI-generated answers, reverse-engineering the sources that feed their citations, and finding the structural gaps between their footprint and yours. It has three layers. Layer one is measurement: who gets cited, how often, on which engine, for which prompts (per-competitor share of voice). Layer two is forensics: for a cited competitor, what feeds the citation — their own pages, Reddit, Wikipedia, G2, third-party listicles, news. Layer three is the wedge: which gaps, if closed, would actually move your revenue rather than just your mention count. Most teams do layer one and stop. Layers two and three are where the money is.
Can I see my competitor's AI share of voice for free?
Partially, and it is the most useful free analysis in GEO. You can manually run a 20-30 prompt set across the four engines weekly, log every brand mention and cited URL in a spreadsheet, and compute each competitor's share of voice with a pivot table. That costs roughly two to three hours a week of labor and breaks down past about 30 prompts. Paid tools (Profound, Otterly, Peec, SE Ranking) automate the prompt replay and competitor tracking from $29 to $499+ per month, which is worth it once you are tracking 50+ prompts on a daily or weekly cadence. None of those tools tell you whether out-citing a competitor actually drove your revenue — that requires first-party attribution joined to Stripe, which is a separate layer.
How many prompts do I need to analyze competitor AI visibility?
Thirty buying-stage prompts per engine is the minimum that produces a stable per-competitor ranking; 50-100 is materially better and lets you segment by intent class. Below about 30 prompts the model's stochastic sampling plus its continuously-updating retrieval index produce enough run-to-run variance that a single competitor can swing 10-15 percentage points of apparent share between two measurement passes, and you will chase noise. Sample each prompt 3-5 times per pass and average or union the results rather than trusting one roll. Weight the set toward the high-intent prompts ("best," "alternatives," "vs," "for [segment]") because those are the answers that actually drive buyers, not the definitional prompts that are easy to win but rarely convert.
What sources feed my competitor's AI citations?
Six source classes account for nearly all of them, in rough order of how much weight they carry: their own well-structured pages (comparison and "best of" content the model can lift directly), third-party "best [category]" listicles, Reddit and forum recommendation threads, their Wikipedia and Wikidata entity, G2 and Capterra review profiles, and news or press coverage. To find each, read the cited URLs in the AI answers directly, then run a brand-name-minus-own-domain search ("Competitor" -site:competitor.com) to map their third-party footprint, check Wikipedia and Wikidata, search Reddit for their name plus "reddit," and look at their G2 review count. The teardown tells you which source class is doing the heavy lifting, which is the one you have to match.
Should I copy whatever my top competitor is doing for AI visibility?
No — copy only the structural gaps that map to revenue-driving prompts, not their whole footprint. A category incumbent may have a Wikipedia entity, 2,000 G2 reviews, and a decade of press you cannot replicate, and chasing all of it is a quarters-long money pit with diminishing returns. The useful move is to isolate the one or two gaps that (a) are realistic for you to close, (b) feed the high-intent prompts you actually want to win, and (c) would plausibly move your revenue if closed. Often that is a single honest comparison page targeting an "alternatives" query, or a Wikidata entity plus a clean sameAs graph — fast, cheap, and aimed at the prompts that convert. Beating an incumbent head-on is usually not the game; owning the alternatives and segment queries is.
Does out-citing a competitor in ChatGPT actually drive my revenue?
Not automatically, and this is the half of competitive GEO analysis nobody measures. Winning a citation slot a competitor used to own only matters if the users who would have clicked them now click you, land on a page that converts, and pay. The link from citation to revenue is mediated by citation position, click-through, landing-page fit, and buyer intent — all of which a share-of-voice tally ignores. The discipline is to measure cited-to-clicked-to-paid per query you win back: did AI-attributed sessions for that prompt's landing page rise after you closed the gap, and did those sessions convert in Stripe? Because ChatGPT strips the referer and GA4 buckets the clicks as Direct, you need first-party server-side attribution joined to Stripe to see it. Otherwise you are optimizing a vanity number against a competitor.
How is competitor AI analysis different from a normal SEO competitor analysis?
A traditional SEO competitor analysis maps keyword overlap, backlink profiles, and SERP rankings — the inputs to a ranked list of ten blue links. Competitor AI analysis maps citation presence inside a synthesized answer that names only three to five brands, has no stable click-through curve, and differs across four non-overlapping engines. The overlap is real (organic rank still feeds AI citations, so Ahrefs and Semrush competitive research are still useful inputs), but the unit of analysis changed from "who ranks for the keyword" to "who gets named in the answer," and the sources that drive it expanded beyond backlinks to include Reddit, Wikipedia, review sites and listicles. You still run the SEO analysis; you layer the AI-citation analysis on top of it.
Which engine should I prioritize when analyzing competitors?
Prioritize the engine where your buyers actually are, then the engine where your competitive gap is most closable, not the engine with the most aggregate users. The four engines diverge hard: Perplexity is citation-forward and lower-barrier for topical specificity, ChatGPT prefers high domain authority and a slower corpus, Claude cites sparingly and prefers primary sources, and Gemini rides existing Google organic rankings. A competitor can dominate ChatGPT and be beatable on Perplexity for the identical queries. Run the audit across all four, but weight your gap-closing effort toward the engine that both reaches your buyers and shows a gap you can realistically close — and confirm the win in revenue, since per-engine conversion varies enormously.
How often should I re-run a competitive AI visibility analysis?
Monthly for the retrieval-driven layer (live citations, listicle presence, organic rank) and quarterly for the corpus-driven layer (training-baked recommendations that only change when an engine ships a new model). The retrieval layer moves within weeks of a competitor publishing new comparison content or earning a listicle placement, so a monthly re-run catches their moves and yours. The corpus layer only shifts on the engines' model-release schedule, so a quarterly browsing-off re-test is enough. Pair every re-run with your first-party AI-revenue line so you are measuring whether the gaps you closed moved money, not just whether your share-of-voice chart moved. A 30-prompt monthly pass is about two to three hours.
What's the single fastest gap I can close to compete in AI answers?
The honest comparison-page matrix. If a competitor out-cites you because they have a page for every "[rival] alternatives" and "[X] vs [Y]" query and you have one or none, you can close most of that gap in a week of genuine writing, and those pages feed the highest-intent prompts in the category. The catch is the word honest — a self-serving comparison that ranks you first on every axis gets cross-referenced and discounted by the model, so the page has to be genuinely useful about where competitors win, which is also just better content. Pair it with a same-day Wikidata entity and a clean sameAs graph, and you have closed the two fastest, highest-leverage gaps in days rather than quarters.
How do I know if my competitor's lead is in the corpus or in live retrieval?
Run your best-of and versus prompts in both browsing-off and browsing-on modes and compare. If the competitor wins in browsing-off mode, their lead is corpus-baked — accumulated press, Wikipedia, Reddit, listicle authority — which you close slowly over quarters as the engines retrain. If they win only in browsing-on mode, their lead is in live retrieval — better-structured pages, more listicle placements, higher organic rank — which you can often match this quarter with content you control. The two modes point at two different gap sets with two different clocks, so testing both is what tells you whether to expect a fast win or a slow grind.
Can a small brand realistically beat a bigger competitor in AI recommendations?
On the right queries, yes; head-on, usually not. You will probably not displace a category-defining incumbent in "best [category] tools" through GEO alone, because their corpus authority is too dense to overcome quickly. But the alternatives, versus, and segment-specific queries are a different game — there the barrier is whoever published the relevant comparison page and matched the buyer's segment language, and a small brand can win those this quarter. The realistic strategy is to concede the high-authority prompts, own the high-intent long-tail ones where the incumbent is weak, and measure whether winning them ships you revenue. Modest share of voice on converting prompts beats large share of voice on prompts nobody clicks.