Analytics
Does Google Know Everything About Your Website?
Google sees almost every visit, click, and conversion on your site, but GA4 hands you a degraded slice. Here is what Google knows that you do not.
Analytics
Google sees almost every visit, click, and conversion on your site, but GA4 hands you a degraded slice. Here is what Google knows that you do not.
Google sees almost every visit, click, and conversion on your website. GA4 shows you a fraction. The gap is not a glitch; it is the product design. Search Console knows every query your URLs ranked for, Chrome knows the load behaviour of roughly two-thirds of your visitors per StatCounter Browser Market Share, Google Ads stitches logged-in conversions through its own identity graph, and the GA4 stream you pay for in event quota is the thinnest of the five surfaces. The real question is not whether Google tracks everything (it largely does), but whether you can see what Google sees. For most operators the honest answer is no, and the fix is cookieless first-party server-side measurement that you own end-to-end.
| Spec | Value |
|---|---|
| Chrome global market share (2026) | ~65% (StatCounter) |
| Safari global market share | ~18% (StatCounter) |
| Safari ITP 2.3 cookie expiry (document.cookie) | 7 days |
| EU consent banner refusal rate | 30-60% (industry surveys) |
| Consent Mode v2 mandatory in EEA since | March 6, 2024 |
| GA4 free user-level data retention | 14 months |
| GA4 free sampling threshold | 10 million events per query |
| Google Ads enhanced conversion hashing | SHA-256 first-party data |
| AI-engine referrals in GA4 default grouping | Lumped as Direct or Referral |
| GA4 360 BigQuery export (industry reported) | ~$150,000 per year starting |
| Stripe Sigma retention | Indefinite within account |
I have spent the last two years stitching first-party attribution into roughly 40 marketing channels across attrifast.com and a handful of client SaaS apps. The pattern that shows up in every audit is identical. The Google Ads dashboard says one number, GA4 says a smaller one, and Stripe says a third that almost always matches Ads better than it matches GA4. The first time this surprised me I burned a weekend rebuilding the GA4 conversion config, certain I had broken something. I had not. GA4 is built to under-report to you while Ads keeps its own books.

Google touches your site through at least five distinct surfaces, and you should treat them as separate intelligence services that report to the same parent.
Search Console. Every time Googlebot crawls a URL, it adds that URL plus its content fingerprint to Google's index. Every time someone searches a query that surfaces your URL, the impression is logged. Click, position, and the search query (when anonymized aggregation thresholds permit) all flow into the Search Console export. The data is yours, you can pull 16 months of it, and it is one of the few surfaces where Google's view roughly equals the operator's view. The catch: queries below a certain volume are hidden as "anonymized" to prevent personally identifiable leakage.
Chrome. Roughly 65% of your global visitors run Chrome per StatCounter. Chrome reports Core Web Vitals from real users to Google through the Chrome User Experience Report, which feeds into PageSpeed Insights and Search ranking signals. Logged-in Chrome users contribute additional signal to Google's identity graph. None of this surfaces in GA4. It surfaces in Search Console's Core Web Vitals report, but only at aggregate URL level, and the underlying raw stream stays inside Google.
Google Ads. If you spend a dollar on Search, Display, YouTube, or Demand Gen, every click is logged with the gclid query parameter plus a server-side click ID Google retains. Enhanced conversions bind those clicks to SHA-256 hashed email or phone numbers you upload, then match against Google's own logged-in graph. That match rate is meaningfully higher than anything GA4 can do on its own. Google Ads will happily tell you "14 conversions" while your GA4 reports 9 from the same campaign over the same window.
GA4. The stream you embed via gtag or GTM. Capped, sampled, consent-gated, and ITP-truncated for Safari. This is what most operators think of as "what Google knows about my site." It is actually the smallest pane.
AdSense, reCAPTCHA, YouTube embeds, Fonts, Maps. If you embed any Google asset on a page, that asset's request carries the user's Google identity (via the third-party cookie when still set, or via Google's Privacy Sandbox Topics API) into Google's ad-side graph. You do not see this data. Google does.
Five surfaces feed Google, one (degraded) surface feeds you. Anyone who has compared a Google Ads conversion column to a GA4 conversion column over a 30-day window has already lived this gap.

GA4 is not lying to you. It is filtering, and the filters are documented, just not in places non-analytics people read.
Consent loss. Consent Mode v2 became mandatory for EEA traffic on March 6, 2024. If a visitor declines the cookie banner, GA4 receives only modeled events, not the real stream. Industry surveys including the HubSpot State of Marketing report show banner refusal rates of 30-60%, with the higher end appearing on news and content sites and the lower end on commerce. Even at a conservative 35%, more than a third of your EEA event volume is replaced by a modeled estimate that no analyst can audit at session level.
ITP truncation. Safari (about 18% global share per StatCounter) caps the lifespan of document.cookie to 7 days under Intelligent Tracking Prevention 2.3. GA4 stores its client ID in a first-party cookie called _ga. For any Safari user who returns on day 8, GA4 treats them as a new user. Returning visitor metrics, attribution windows longer than a week, and any cohort analysis that assumes user continuity all break for that share of your audience.
AI-engine referral collapse. When ChatGPT, Perplexity, Claude, or Gemini cites your site, the referrer string lands as chat.openai.com, perplexity.ai, claude.ai, or it strips entirely. GA4's default channel grouping has no rule that catches these. They land in Direct or Referral. For sites where AI search is a meaningful traffic source, the operator looks at their GA4 channel breakdown and concludes "AI traffic does not convert," when really the channel has been silently merged into Direct. We covered the mechanics for one specific Google surface in Google AI Overviews 2026 (citation patterns and the GA4 Direct/(none) bucket), and the same logic applies to every external AI engine.
Data retention cap. GA4 free tier hard-limits user-level and event-level data retention to 14 months per Google's official retention docs. Anything older drops out of explorations and is no longer joinable. For a SaaS founder trying to understand a 24-month customer journey, this is a structural blocker that GA4 360's $150,000-per-year BigQuery tier solves and the free tier does not.
Sampling. GA4 free applies sampling to any standard report query touching more than 10 million events in the requested time range. Sampling is rarely catastrophic for top-line metrics, but it compounds with the consent and ITP losses to produce a dashboard that is materially smaller than reality.
I ran into a small case of this last quarter on a client site. They had clean GA4 conversions on Google Ads for a month, then went silent on the same channel for two weeks while paid clicks kept arriving. The fix was not GA4 (the property was fine). Consent Mode v2 had been swapped from one cookie banner to another on a Friday deploy, and the new banner did not pass wait_for_update correctly to gtag. GA4 dropped roughly 40% of the EEA events for those two weeks. Ads kept counting fine, because Ads uses its own click ID, not the GA4 event stream.
A rough visualization of where the events go, for a typical SaaS site with 65% Chrome / 18% Safari / mixed EEA traffic:
Google records ████████████████████████████████████████ 100%
Google Ads sees █████████████████████████████████░░░░░░░ ~83%
GA4 (free) shows ██████████████████████████░░░░░░░░░░░░░░ ~65%
- lost to consent ░░░░░░░░░░░░░░░░░░░░░░░░░░██████░░░░░░░░ ~15% (EEA, modeled only)
- lost to ITP ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██░░░░░ ~8% (Safari, day 8+)
- lost to sampling░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██░░ ~5% (queries above 10M)
- AI referrals ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██ ~7% (lumped as Direct)
The exact percentages move with your traffic mix, and the four loss bars overlap somewhat (a Safari user in the EEA double-counts). Treat the chart as a directional sketch of why the dashboard never matches the ad platform.

The numbers above are easier to absorb side by side. The checklist below is the one I keep on a sticky note in my editor when I am rebuilding attribution for someone.
| # | What Google sees | What GA4 shows you | Why the gap exists | How to close it | Source |
|---|---|---|---|---|---|
| 1 | Every crawl, query, impression, click for your URLs | Aggregated Search Console export, 16-month cap | Search Console is the surface Google chooses to share | Pull Search Console API daily and warehouse it | Search Console API |
| 2 | Core Web Vitals from real Chrome users | Aggregate CrUX in Search Console only | Raw Chrome telemetry is Google's proprietary stream | Add a first-party RUM script you own | CrUX docs |
| 3 | Every gclid click with full metadata | gclid only if user consents and Safari has not expired the cookie | GA4 depends on consent + cookie persistence | Capture gclid server-side at first request | How Search Works |
| 4 | Enhanced conversion match via SHA-256 hashed email | A subset of conversions where the GA4 stream survived | Ads has access to the Google logged-in graph; GA4 does not | Send SHA-256 hashed conversions server-side from your DB | Enhanced conversions |
| 5 | YouTube view-through path before site visit | Nothing | YouTube view-through is Ads-only | Treat YouTube as an Ads-side conversion source | Google Ads Help |
| 6 | Logged-in cross-device journey | Single-device cookie session | GA4 cross-device requires User-ID setup most sites skip | Pass a stable user_id server-side after auth | GA4 User-ID |
| 7 | EEA traffic during consent-denied sessions (modeled) | Modeled aggregates only, no session detail | Consent Mode v2 design choice | Run cookieless server-side capture in parallel | Consent Mode v2 |
| 8 | Safari users on day 8+ as the same user | Treated as new user after 7 days | ITP 2.3 cookie expiry | First-party server-side session ID, not document.cookie | WebKit ITP policy |
| 9 | All events without sampling | Sampled above 10M events per query | Free-tier query economics | Export raw events to your own store before sampling | GA4 sampling |
| 10 | Indefinite data retention for ads use | 14-month user-level retention cap | Free-tier policy; 360 lifts it at cost | Warehouse events yourself daily | GA4 retention |
| 11 | Privacy Sandbox Topics for browsing interest | Nothing | Topics API feeds Google's ad graph, not the publisher | Capture first-party intent signals (page, scroll, query) | Topics API |
| 12 | AI-engine referrer strings from Bard/Gemini | Lumped as Direct/Referral | GA4 default channel grouping has no AI rule | Custom channel grouping + server-side referrer inspection | GA4 channel groupings |
Twelve rows, twelve distinct losses, one common cause: the data exists on Google's side and either is not shared with you, is shared in aggregate only, or arrives degraded through GA4's gates.
Three design choices, all rational from Google's perspective and all costly to operators.
Modeled conversions favour Ads, not the publisher. When Consent Mode v2 strips identifiers, GA4 fills the gap with modeled conversions for use inside Google Ads' bid auction. The model is good enough to keep Smart Bidding happy. It is not good enough for an operator to audit which campaign produced which Stripe payout. That asymmetry is fine for Google (the auction still functions) and bad for you (the books do not match).
Consent Mode v2 is a regulatory shield. GDPR Article 6 plus the ePrivacy Directive's cookie consent rule put the legal risk for cross-site tracking onto the publisher. Consent Mode v2 lets Google receive modeled signal even when the user declined, which keeps Google's auction running on EEA traffic. The publisher carries the consent banner, the consent log, and any regulatory fine. Google receives the modeled stream regardless.
Privacy Sandbox keeps the data inside Chrome. The Topics API lets Chrome compute browsing interest categories on-device, then expose them to ad networks at request time. The categorization happens in Chrome (which Google owns), and the categories surface in the auction (which Google also owns). The publisher's only access is "did this user click my ad," which is what Ads already showed before Sandbox existed.
The 360 tier monetizes the gap. GA4 free is a degraded view. GA4 360, industry-reported at around $150,000 per year starting, lifts the retention cap, enables BigQuery export with raw events, and removes most sampling. The free-vs-paid gap is wide on purpose. Free is the funnel; 360 is the revenue product. If you want to see what Google sees, you pay enterprise pricing or you build first-party measurement yourself.
I do not blame Google for any of this. I would design it the same way if I owned the surfaces. The point is to be honest about who the customer is: the advertiser using Smart Bidding, not the operator reading the GA4 dashboard. Once you accept that, the fix becomes obvious.
The fix is not "leave GA4." Keep GA4 if you run Google Ads, because Smart Bidding is downstream of the GA4 conversion stream. The fix is to put a first-party server-side layer next to GA4 that captures what GA4 quietly drops.
Four moving parts, in order of how much they recover.
1. First-party tracking script on your own domain. A small (the Attrifast script is about 4kb minified) script served from your own root domain. It stores a first-party session ID in a cookie that ITP cannot expire to 7 days, well, actually ITP still applies a 7-day cap to client-set cookies, but a cookie set server-side via the Set-Cookie HTTP header is not subject to the same cap, which is the loophole that keeps first-party server-side capture viable. It captures UTM parameters, referrer, landing page, and a session start timestamp on first request. No third-party calls. Plays nicely with GDPR cookieless analytics patterns when configured without identifiers.
2. Server-side event capture. Every meaningful event (sign-up, trial start, paid conversion) is fired from your backend, not from the browser. Server-side firing survives ad blockers, ITP, and consent. PostHog's server-side capture guide is a clean technical reference for the pattern even if you do not use their tool. The event includes the first-party session ID from step 1, so the backend can join the Stripe payment to the UTM that started the session.
3. Stripe webhook join. Stripe sends webhooks for checkout.session.completed, customer.subscription.created, and friends. Your backend matches the Stripe event's customer email or session ID to the first-party session ID, then writes one row to your attribution table: timestamp, channel, campaign, revenue, currency. This is the only step where the Stripe payout actually meets the marketing channel, and it lives entirely on your infrastructure. Stripe Sigma retains the underlying data indefinitely within your account, so the join is durable past GA4's 14-month wall.
4. Cookieless by default. No third-party cookies, no fingerprinting, no cross-site identifiers. The session ID is first-party only and tied to the visit, not the person. Plausible documented the pattern in their cookieless tracking docs and the regulatory rationale in GDPR.eu's cookie guidance. The trade-off is real: you cannot do retargeting against this data, and cross-device matching requires explicit auth (a user logging in). For most bootstrapped SaaS, that trade is correct.
I have wired this exact loop into roughly 40 marketing channels across my own properties and a handful of client SaaS apps. The honest result: it recovers most of the events GA4 drops on Safari and on consent-denied EEA traffic. It does not magically restore cross-device journeys from before the user logs in (nobody can, including Google). For deeper context on what specifically dies when ITP is involved, my earlier piece on cross-site tracking after ITP walks through the 7-day cookie cap and the Safari attribution evaporation I watched live. For the channel-level break-out, Google Analytics alternative for revenue tracking lays out the joins. And for the GA4-specific limitations this article touches on, GA4 revenue attribution limitations covers them at length.
Attrifast packages those four moving parts as a managed service with privacy-first analytics and a Stripe-native cookieless revenue analytics pipeline. The capability is buildable in-house; Attrifast just removes the boilerplate.
The shift is from "rent a dashboard" to "own a data pipeline." Six dimensions where the ownership model differs.
| Dimension | GA4 free tier | First-party server-side |
|---|---|---|
| Where raw events live | Inside Google's infrastructure | Inside your database |
| Data retention | 14 months user-level | Indefinite, you decide |
| Sampling | Applied above 10M events per query | None, you query your own data |
| Consent loss | 30-60% in EEA replaced by modeled aggregates | Cookieless capture survives consent decline |
| ITP impact (Safari) | 7-day client ID resets | First-party cookie via Set-Cookie persists |
| AI-engine referrals | Lumped as Direct or Referral | Server-side referrer inspection breaks them out |
Three notes on the table. First, the GA4 column is GA4 free; GA4 360 closes some of the gap but adds around $150,000 per year in industry-reported license cost. Second, even with 360 the data still lives in Google's BigQuery instance, not yours, so the ownership question is unresolved. Third, the first-party column is not free either. You pay for storage, the script load, and a backend that fires events. The trade is direct: cash out of pocket vs. cash spent on misallocated ad budget because the dashboard lied.
One honest limit: first-party server-side measurement cannot tell you what a logged-out visitor did on a different domain before they hit yours. Nobody can. That information left the browser when ITP shipped in 2017 and it is not coming back. If you need cross-domain pre-visit signal, you are buying ads against Google's identity graph or Meta's, which is the implicit deal. The first-party stack reclaims everything on your domain. The cross-domain layer is gone, and pretending otherwise is the marketing-attribution version of cold fusion.
Almost every one. If a visitor uses Chrome (roughly 65% global share per StatCounter), is logged into a Google account, or arrives via Google Search, Google Ads, or YouTube, Google's ad-side and search-side systems see the visit even when GA4 does not. The exceptions are users behind aggressive blockers, Safari with full ITP enabled, and bots that ignore the Google ecosystem. For a typical SaaS site with normal Chrome share, the operator's GA4 dashboard usually shows 50-70% of what Google itself recorded.
Three structural reasons. First, GDPR Consent Mode v2 drops uncalled events from EEA traffic, which industry surveys put at 30-60% banner refusal. Second, Safari ITP 2.3 expires the document.cookie GA4 client ID after 7 days. Third, GA4 free tier samples queries that touch more than 10 million events. Google Ads uses enhanced conversions with SHA-256 hashed first-party data and its own logged-in graph, so it counts conversions GA4 never sees.
Through Google Search Console, the search queries, impressions, clicks, position, and Core Web Vitals for every URL Google has crawled. Through Chrome (about 65% of your visitors), the page load, Largest Contentful Paint, and any anonymous user-journey signals Chrome opts into. Through Google Ads, every click on every campaign plus logged-in conversion stitching. Through GA4, the events you fire. Through Google's AdSense and reCAPTCHA pixels if you embed them, additional cross-site signal. Across these surfaces, Google sees materially more about your traffic than your own dashboard does.
No. AI-engine referrals from ChatGPT, Perplexity, Claude, and Gemini are not in Google's ad systems at all. They show up as direct or referral in GA4 default channel grouping, which means most operators never see them as a distinct channel. The traffic exists, the conversions exist, but neither Google Ads nor default GA4 attributes them. First-party server-side tracking that inspects the referrer string and User-Agent is the only reliable way to break them out.
Partly. Search Console gives you 16 months of query, impression, click, and position data for free. Google Ads gives you click and conversion data for your own campaigns. PageSpeed Insights and the Chrome User Experience Report (CrUX) give you Core Web Vitals from real Chrome users at no cost. What you cannot get free is the cross-surface join: which Search query led to which Ads click led to which logged-in conversion. That join lives inside Google. GA4 360 BigQuery export (industry-reported pricing around $150,000 per year) buys you closer to it, but the operator still does not own the underlying graph.
Discover which marketing channels bring customers so you can grow your business, fast.
Start free trial →5-day free trial · $29/mo · cancel anytime