GEO Strategy

llms.txt Explained: Does It Actually Improve AI Visibility and Revenue in 2026?

A skeptical 2026 deep-dive on llms.txt: what the spec actually is, who reads it, whether it changes AI citations, and how to measure the revenue lift yourself instead of trusting vendor hype.

Part of the GEO Hub and AEO Hub.

I shipped an llms.txt on attrifast.com in October 2024, about three weeks after Jeremy Howard floated the idea. I wanted to be early. I also wanted to believe it would do something, because the GEO space was loud with people telling me it was the next robots.txt and I am as susceptible to a clean narrative as anyone. Eighteen months later I have a confession that does not show up in many vendor decks: I still cannot prove the file moved a single citation or a single dollar on my own site. Not because it didn't, but because the honest measurement is genuinely hard, and almost nobody selling you llms.txt tooling has done it.

This article is the piece I wish I had read in late 2024. It explains what llms.txt actually is, where the spec came from, who really reads it versus who claims to, what the format looks like line by line, and why the citation-lift claims you have seen are mostly unfalsifiable marketing. Then it does the thing the rest of the genre skips: it walks through how to measure the revenue impact on your own site with a before/after test, because that is the only number that survives a board meeting. If you want the broader playbook, the GEO tactics playbook for 2026 sits alongside this; this piece is the deep, skeptical drill into one tactic.

How AI engines and documentation crawlers consume llms.txt: a curated markdown index at the domain root, read unevenly across consumer chat surfaces and developer-tool crawlers

Quick Facts

ItemValueSource
Proposed byJeremy Howard, Answer.AIAnswer.AI / llmstxt.org [1][2]
Proposal dateSeptember 2024llmstxt.org changelog [1]
Spec homellmstxt.orgllmstxt.org [1]
File location/llms.txt at domain rootSpec [1]
File formatMarkdown (CommonMark)Spec [1]
Companion filellms-full.txt (full content inlined)Spec / Mintlify [1][8]
Does Google use it?No (per John Mueller)Search Engine Land / Mueller [10][11]
Anthropic ships one?Yes, for its docsAnthropic docs [4]
Mintlify auto-generates it?Yes, for hosted docsMintlify docs [8]
Estimated adoption (public SaaS, Q1 2026)~7-10%Attrifast sampling, n=~300
Hard public evidence of citation liftMinimal / anecdotalThis article's read of the literature
Cost to ship~30 minutes, one-timeAuthor estimate
Recurring cost if hand-maintained~15 min/quarterAuthor estimate
Standardization statusInformal; not an IETF RFCIETF AI-preferences WG scope [13]

Two of those rows matter more than the rest. "Does Google use it? No" is the row that should temper every citation-lift claim aimed at marketers, because Google AI Overviews are the highest-volume AI surface most businesses touch. And "Hard public evidence of citation lift: minimal" is the row this entire article is built around. The rest is detail.

What llms.txt actually is (and what it is not)

llms.txt is a single file, written in markdown, that you place at the root of your domain so that it resolves at https://yoursite.com/llms.txt. Inside it, you write a short description of what your site or product is, then list your most important pages as markdown links, each with a one-line description. That is the whole thing. The specification at llmstxt.org [1] is intentionally tiny, the kind of thing you can read in five minutes.

The proposal came from Jeremy Howard, co-founder of Answer.AI and previously fast.ai and Kaggle, in September 2024 [1][2]. His framing was practical and narrow: large language models have small effective context windows relative to a full website, and most websites are full of navigation, scripts, ads, and boilerplate that waste tokens and confuse extraction. A curated markdown file gives a model a clean, high-signal map of what matters on your site, expressed in the format models parse most cleanly. That is the entire pitch. It is a good pitch. It is also a much narrower pitch than the GEO industry has since wrapped around it.

Here is the distinction that gets lost. There are two kinds of "standard" on the web.

Type of standardExamplesEnforcementWho consumes it
Enforced, named-crawler standardrobots.txt, sitemap.xml, HTTP caching headersCrawlers publicly commit to honoring itGooglebot, Bingbot, GPTBot, named bots
Informal conventionllms.txt, humans.txt, security.txt (partly)Voluntary; no public commitment from major consumersWhoever chooses to read it

robots.txt sits in the first row. Google, Bing, OpenAI's GPTBot, and Anthropic's ClaudeBot all publicly document that they honor it [4][14]. sitemap.xml sits in the first row too; it is part of the sitemaps.org protocol [12] that search engines consume by contract. llms.txt sits squarely in the second row. No major AI engine has published a commitment to consume it at inference time. That single fact reframes everything else in this article.

So what llms.txt is not:

  • It is not a ranking signal. There is no documented mechanism by which adding llms.txt raises your position in any AI answer.
  • It is not a crawl directive. It does not grant or deny access; that is robots.txt's job.
  • It is not an index Google or Gemini reads. John Mueller of Google has said so directly [10][11].
  • It is not required for AI visibility. Plenty of heavily-cited pages have no llms.txt at all.
  • It is not verified. There is no console that tells you "your llms.txt was read and used."

Put as a myth-vs-reality table, since these confusions recur in every GEO thread:

Common mythReality
"It's robots.txt for AI"Different job: curation, not access control
"Adding it ranks me in ChatGPT"No documented ranking mechanism
"Google reads it for AI Overviews"Mueller: Google does not use it
"Every AI engine consumes it"Most consumer surfaces undocumented
"I need it to be cited at all"Many cited pages have none
"A SaaS must generate it"It's 30 lines of markdown you can write by hand
"A crawl of the file proves it worked"A fetch is not a citation is not a click is not revenue

What it is: a low-cost, plausibly-useful curation convention, most reliably consumed inside developer-documentation tooling, that summarizes your best pages in the format models like. That is a real thing worth doing. It is just not the thing the louder vendors are selling.

llms.txt vs robots.txt vs sitemap.xml: the comparison that clears up most confusion

The single most common misconception I field is that llms.txt is "robots.txt for AI." It is not. They solve different problems, are consumed differently, and you should run all three. Here is the full breakdown.

Dimensionrobots.txtsitemap.xmlllms.txt
Primary purposeAllow/deny crawler accessEnumerate all indexable URLsCurate + explain key pages for LLMs
FormatPlain text, directive syntaxXMLMarkdown
Location/robots.txt/sitemap.xml (or linked)/llms.txt
Curated or exhaustive?Rules, not a listExhaustive listCurated subset
Human-readable prose?NoNoYes (descriptions encouraged)
Enforced by major crawlers?YesYesNo public commitment
Standardized body?De facto + RFC 9309sitemaps.org protocolInformal (llmstxt.org)
Google honors it?YesYesNo (per Mueller)
OpenAI GPTBot honors it?Yes (robots rules)Reads sitemapsNot documented
Anthropic ClaudeBot honors it?Yes (robots rules)Reads sitemapsShips own; consumption undocumented
Typical size<1 KBKB to MB1-5 KB (llms-full.txt larger)
Risk of misconfigurationHigh (can deindex you)LowNear zero
Time to author10 minAuto-generated~30 min

The risk column is the practical one. A botched robots.txt can deindex your entire site, which is why people are rightly cautious with it. llms.txt cannot hurt you that way. The worst case for llms.txt is that nothing reads it, which is the same outcome as not having one, minus 30 minutes. That asymmetry is exactly why I recommend shipping it even though I am skeptical of the citation claims. Low downside plus uncertain upside equals "do it, but measure it."

A second framing that helps: think of the three files as restrict, enumerate, curate.

FileOne-word jobWhat a model gets from it
robots.txtRestrictWhat it is allowed to fetch
sitemap.xmlEnumerateEvery URL that exists
llms.txtCurateWhich URLs matter, and why

You want all three. They do not compete. The mistake is dropping sitemap.xml because you added llms.txt; an AI crawler that respects sitemaps still wants the exhaustive list for coverage, and the curated file for priority. Belt and suspenders.

Where llms.txt came from: the Answer.AI proposal and the early adopters

The origin story matters because it explains the gap between what llms.txt was designed for and what it is now marketed as. Jeremy Howard's original proposal [2] is explicitly framed around documentation and developer context. The motivating example in the spec is a software library: you want a coding assistant to understand your API quickly, so you give it a clean markdown map of your docs plus, optionally, an llms-full.txt that inlines the full content of those pages so the model never has to crawl at all.

That framing is why the strongest early adoption clustered in developer tooling, not marketing sites.

Adopter / platformWhat they didSurfaceSource
Answer.AIAuthored the spec, hosts llmstxt.orgSpec home[1][2]
MintlifyAuto-generates llms.txt + llms-full.txt for hosted docsDocs hosting[8]
AnthropicPublishes llms.txt for its documentationVendor docs[4]
Cursor / IDE assistantsFetch docs llms.txt for contextCoding assistantscommunity reports [9]
Various OSS librariesShip llms.txt in reposGitHub[9]
VercelPublished guidance and patterns for llms.txtPlatform docs/blog[7]

Notice the pattern: documentation platforms and coding assistants, where the consumer is an IDE or an agent that is explicitly fetching docs to answer a developer's question in-context. That is a real, working use case. When a coding assistant pulls your llms-full.txt to answer "how do I authenticate with this API," the file did its job, directly and measurably (the model gave a better answer about your product).

The leap that the GEO industry made was to assume the same mechanism applies to consumer chat. That a marketer's blog post about, say, "best CRM for plumbers" gets cited more often in ChatGPT because the site has an llms.txt. That leap is unproven. The consumer chat surfaces (ChatGPT, Perplexity, Claude.ai chat, Google AI Overviews) are powered by training corpora and live-search indexes whose retrieval logic is not documented to consult llms.txt. There is a meaningful difference between "an IDE agent deliberately fetches your docs map" and "a frontier model's retrieval pipeline happens to weight your llms.txt." The first is happening. The second is asserted, not shown.

There is also a community directory aspect worth knowing about. Public directories of sites that have shipped llms.txt have sprung up (you can find lists and the spec's own examples linked from the llms.txt GitHub ecosystem [3]). These are useful for seeing real examples, but a directory of adopters is not evidence of consumption by AI engines. It is evidence that adoption happened, which is a different and much weaker claim.

How AI engines consume (or ignore) llms.txt

Let me draw the actual data flow, because the hand-wavy version ("AI reads your llms.txt and cites you more") obscures where the chain actually breaks. There are two distinct paths, and they behave nothing alike.

The left branch (IDE agent) is the one that works and is the use case the spec was written for. The right branch (consumer chat) is the one marketers care about, and it is where the evidence evaporates. Let me be precise about each engine's documented posture as of mid-2026.

AI surfaceDocumented llms.txt consumptionNotes
Google Search / AI OverviewsNo (Mueller stated Google does not use it)Highest-volume AI surface; relies on its normal index [10][11]
Google GeminiNot documentedTrains via Google-Extended; no llms.txt commitment [14]
OpenAI ChatGPT (chat)Not documentedGPTBot crawls per robots.txt; llms.txt use undocumented [5][6]
OpenAI ChatGPT SearchNot documentedOAI-SearchBot indexes; no llms.txt commitment [5][6]
Anthropic Claude (chat)Not documented for inferenceAnthropic ships one for its own docs [4]
PerplexityNot documentedPerplexityBot crawls; no public llms.txt commitment
Microsoft CopilotNot documentedBing-index-backed
IDE coding assistants (Cursor, etc.)Yes, in practiceExplicitly fetch docs llms.txt [9]
Documentation search (Mintlify-hosted)YesGenerates and serves it [8]

Read that table honestly and the conclusion is uncomfortable for the GEO-tooling industry: the engines that drive the most marketing-relevant citations are exactly the ones with no documented llms.txt consumption, and the surfaces that do consume it are developer tooling. If your business is a developer tool with docs, llms.txt is a clear win. If your business is a plumber-CRM whose target citations live in ChatGPT and Google AI Overviews, llms.txt is a speculative bet on an undocumented mechanism.

The one nuance that keeps the bet from being zero: training crawlers (GPTBot, ClaudeBot, Google-Extended) do fetch files at your root, and an llms.txt that points them efficiently at your best pages could, plausibly, improve how well those pages are represented in a future training corpus. "Could plausibly" is the operative phrase. Nobody outside the labs can measure the weight, and the labs have not published it. This is the difference between a defensible hypothesis and a marketing claim, and the GEO industry routinely collapses the two.

The llms.txt format spec, line by line

The spec is small enough to cover in full. Here is the structure the llmstxt.org specification [1] defines, with each element's role.

ElementRequired?Markdown formPurpose
H1 titleRequired# Project NameThe single H1; names the site/product
Blockquote summaryRecommended> One-paragraph summaryHigh-signal description of what the site is
Free-form detailOptionalPlain paragraphsExtra context, caveats, how to use the site
H2 sectionsOptional## SectionGroup links (Docs, Guides, Posts, Optional)
Link list itemsRequired (in sections)- [name](url): descriptionThe curated pages with one-line descriptions
"Optional" sectionSpecial## OptionalPages a model may skip if context is tight

A few rules that trip people up:

  • There must be exactly one H1, and it must be first. Multiple H1s break the parse expectation.
  • The blockquote immediately after the H1 is the summary, and it carries a lot of the file's value because it is the one place you state plainly what you are.
  • Link list items follow the form - [Page name](absolute-url): one-line description. Use absolute URLs, not relative paths, so a model that lifts a link out of context still resolves it.
  • The ## Optional section is a signal to the model: if you are short on context budget, these pages are safe to drop. Use it for nice-to-haves.

There is also the companion llms-full.txt. Where llms.txt is a map (links plus descriptions), llms-full.txt inlines the entire markdown content of those pages into one file, so an agent can ingest your whole relevant corpus in a single fetch with zero crawling. This is the file that does real work in the IDE-assistant use case, and it is what Mintlify auto-generates [8]. It can get large (hundreds of KB for a big docs set), which is fine for a deliberate agent fetch and useless for a casual one.

FileContainsTypical consumerTypical size
llms.txtCurated links + descriptionsAny crawler/agent1-5 KB
llms-full.txtFull inlined page contentCoding/docs agents50 KB-1 MB+

A quick reference for writing the link descriptions well, since this is where the file earns or wastes its value:

Weak descriptionStronger descriptionWhy
"Our pricing page""Pricing: $29/mo flat, what each tier includes"Names the concept a user would ask about
"Blog post about AI""How AI engines decide what to cite, with limits"Encodes the actual question the page answers
"Features""Revenue attribution by channel without cookies"Specific capability, not a nav label
"Docs""API auth, quickstart, and SDK reference"Tells the model what it will find

My own rule: ship llms.txt for everyone, and only ship llms-full.txt if you are a developer tool or docs-heavy product where an agent ingesting your full docs in one shot is a realistic, valuable event. For a marketing site, llms-full.txt is mostly cargo-culting.

A copy-pasteable llms.txt you can actually use

Here is a complete, working llms.txt for a SaaS, structured per the spec. This is close to what runs on attrifast.com, edited for clarity. Note this lives inside a fenced code block, so any characters are safe here; the format itself does not use angle brackets at all.

# Attrifast

> Attrifast is a privacy-first, Stripe-native revenue attribution tool for SMB SaaS and ecommerce. The cookieless 4kb script captures first-party sessions and joins them to Stripe webhook events server-side, so you can see revenue by channel, including AI engines like ChatGPT and Perplexity, without third-party cookies or a consent banner in most jurisdictions.

This file curates the pages most useful for understanding what Attrifast does, how the attribution architecture works, and how we think about measurement. Prices are USD. Founder: Vincent Ruan.

## Core product
- [Homepage](https://attrifast.com/): Product overview, positioning, and pricing ($29/mo).
- [Revenue attribution by channel](https://attrifast.com/features/revenue-attribution): How channel-level revenue attribution works without third-party cookies.
- [Track ChatGPT traffic](https://attrifast.com/track-chatgpt-traffic): Detecting and attributing ChatGPT referral sessions server-side.
- [Track AI Overviews](https://attrifast.com/track-ai-overviews): Measuring Google AI Overviews-driven sessions and revenue.

## Methodology and architecture
- [Return-delay penalty methodology](https://attrifast.com/methodology/return-delay-penalty): How we account for the lag between AI-cited click and paid conversion.
- [Cookieless architecture](https://attrifast.com/features/cookieless-revenue-analytics): The first-party, consent-light privacy design.

## Guides
- [How to get cited by AI engines](https://attrifast.com/blog/how-to-get-cited-by-ai-engines): The 7-step GEO playbook, with honest limits.
- [GEO tactics playbook 2026](https://attrifast.com/blog/geo-tactics-playbook-2026): The broader GEO strategy this site recommends.
- [AEO vs SEO in 2026](https://attrifast.com/blog/aeo-vs-seo-2026): How answer-engine and search-engine optimization diverge.

## Optional
- [About / founder](https://attrifast.com/about): Vincent Ruan's background and the entity page.
- [AI crawler tracking](https://attrifast.com/blog/ai-crawler-tracking-2026): Logging GPTBot, ClaudeBot, and friends.

That is the entire file. Roughly 30 lines, one H1, one blockquote summary, sectioned links with descriptions, an ## Optional block. It took me longer to decide which pages to include than to write the markdown. The hard part is editorial (what are my genuinely best pages?), not technical.

A note on the descriptions: write them for a model that has never seen your site and needs to decide, in one line, whether a page is relevant to a user's question. "How channel-level revenue attribution works without third-party cookies" is a better description than "Our revenue feature" because it contains the actual concepts a user might ask about. Treat each description as a tiny piece of retrieval bait, honestly written.

Does llms.txt work? The honest evidence audit

Now the section the title promises. I am going to lay out what we actually know, sorted by how strong the evidence is, because the GEO discourse mixes these tiers constantly.

Tier 1 — Documented, confirmable.

ClaimStatusSource
IDE/coding agents fetch docs llms.txt to answer in-contextConfirmed in practicecommunity + Mintlify [8][9]
Mintlify auto-generates llms.txt / llms-full.txtConfirmedMintlify docs [8]
Anthropic ships an llms.txt for its docsConfirmedAnthropic docs [4]
Google does not use llms.txt for SearchConfirmed (Mueller)Search Engine Land [10][11]
The spec exists and is stableConfirmedllmstxt.org [1]

Tier 2 — Plausible, unmeasured.

ClaimStatusWhy uncertain
Training crawlers fetch llms.txt and it helps corpus representationPlausibleNo lab has published weight or effect
llms.txt at root makes a site cheaper to crawl efficientlyPlausibleLogical, but no public crawl-budget study for AI bots
Curated descriptions improve passage selectionPlausibleNo controlled test isolating the file

Tier 3 — Asserted, unsupported.

ClaimStatusReality
"Adding llms.txt boosts ChatGPT citations by N%"UnsupportedNo public controlled before/after with the file as the only variable
"llms.txt is a ranking signal"UnsupportedNo documented ranking mechanism
"Every AI engine reads llms.txt"FalseMost consumer surfaces undocumented; Google says no
"You need llms.txt to be cited"FalseMany cited pages lack one

The discourse problem is that Tier 3 claims are presented with Tier 1 confidence, usually in a sales context. When a tool tells you "sites with llms.txt get cited 30% more," ask the only question that matters: was llms.txt the only variable, and was there a holdout? The answer is almost always no. The "30% more" cohort also tends to have better content, better schema, more backlinks, and an active GEO team. llms.txt is correlated with being the kind of site that does everything right, which tells you nothing about llms.txt's marginal effect.

A short checklist for stress-testing any "llms.txt boosted citations" claim before you believe it:

Question to ask the vendorGood answerRed flag
Was llms.txt the only change?"Yes, frozen everything else""We added it during a broader GEO push"
Was there a holdout / control?"Yes, matched pages without it""No, just before/after on the whole site"
What metric moved?"AI-attributed revenue per visitor""Citation count" or "visibility score"
Over what window?"8-12 weeks, noise band stated""We saw a lift in week one"
Did a model update land in the window?"None we're aware of""Not sure"
Can I reproduce it on my site?"Here's the test design""Trust our aggregate"

This is the same trap I see in the broader question of whether GEO actually drives revenue: correlation between "did the GEO things" and "got cited" gets sold as causation for each individual tactic, when the only honest causal claim requires isolating the variable. Almost nobody does.

My own attrifast.com experience, stated as honestly as I can: I shipped llms.txt in October 2024. AI-attributed traffic to the site has grown a lot since then. I cannot attribute any of that growth to the llms.txt specifically, because in the same window I also shipped schema, Direct Answer blocks, entity disambiguation, and a couple dozen posts. The llms.txt is one of six things I changed. Isolating its effect would require a holdout I never ran. So when someone asks me "did llms.txt work for you," the truthful answer is "I don't know, and neither does anyone telling you it worked for them."

llms.txt adoption by site type: who should bother

Adoption is uneven by category, and so is the likely payoff. Here is my read, based on sampling roughly 300 public sites across categories in Q1 2026 plus the consumption logic above.

Site typeAdoption (sampled)Likely payoffWhy
Developer tools / API products~22%HighCoding agents actually fetch docs llms.txt
Documentation-heavy SaaS~18%HighSame; docs are the cited surface
General B2B SaaS (marketing site)~9%Low-mediumSpeculative; consumer-chat consumption undocumented
Content publishers (tech)~11%Low-mediumTraining-crawl plausibility only
Ecommerce (catalog)~3%LowSitemap + product schema matter more
Local services~1%Very lowAI rarely owns the surface; no docs to map
Open-source libraries~30%HighThe spec's home use case

The shape is consistent with everything above: the higher the payoff, the more it is a developer-documentation play. For a general B2B SaaS marketing site (the Attrifast reader), the right framing is "cheap insurance plus a plausible training-crawl benefit," not "a citation lever."

A second cut, by what the file should contain for each type:

Site typeWhat to list in llms.txtWhat to skip
Developer toolAPI docs, quickstart, auth, SDK pages; ship llms-full.txtMarketing fluff
B2B SaaSCore feature pages, methodology, pricing, top guidesEvery blog post
Content publisherPillar pages, cornerstone explainersThin tag/archive pages
EcommerceCategory pages, buying guides, sizing/spec referencesIndividual SKUs (use sitemap)
Local servicesService pages, service-area pages, FAQ(Often skip llms.txt entirely)

The ecommerce row is the one people get wrong most. Do not enumerate 4,000 products in llms.txt. That is what sitemap.xml is for. Curate your category and guide pages, and let the catalog flow through the sitemap. The whole point of llms.txt is curation, and a 4,000-line "curated" file is a contradiction.

The measurement gap: why GA4 cannot tell you if llms.txt worked

Suppose you ship llms.txt and want to know if it did anything. Your instinct is to open GA4 and look at AI traffic. This does not work, for the same structural reason it does not work for any AI-engine measurement, and it is worth restating because it is the crux of the whole "does it work" question.

AI engines frequently strip the Referer header on outbound clicks. The ChatGPT clients, Perplexity in some configurations, Claude, and Google AI Overviews variously suppress the referer or open links in contexts where it is not passed. When the referer is empty and there is no UTM tag, GA4 has nothing to match against its channel rules, so the session lands in Direct / (none). I walk through the exact mechanics in the ChatGPT referral analytics guide, but the summary is brutal: in default GA4, the large majority of AI-engine clicks are invisible as AI, lumped into Direct alongside bookmarks and email-app clicks.

What you want to measureCan GA4 show it?Why not
AI-engine sessionsNoReferer stripped → Direct/(none)
AI-engine revenueNoChannel misattributed; revenue follows
Per-engine split (ChatGPT vs Perplexity)NoNo built-in AI channel rules
Before/after llms.txt deltaNoBaseline itself is mis-bucketed
Crawl behavior (GPTBot fetching llms.txt)No (GA4 is client-side)Bots do not run your JS tag

The last row is important and often missed. GA4 runs in the browser. AI training crawlers and the bot that fetches your llms.txt do not execute your JavaScript tag, so GA4 never sees them at all. The only place a GPTBot fetch of /llms.txt shows up is your server logs. If you want to know whether anything is even reading your llms.txt, the first move is to grep your access logs, not open GA4. I cover the crawler-logging side in AI crawler tracking for 2026.

So the measurement problem splits into two halves:

  1. Is anything fetching my llms.txt? Answerable from server logs (look for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and IDE-agent user-agents hitting /llms.txt and /llms-full.txt).
  2. Did adopting it change my AI-attributed revenue? Not answerable in GA4. Answerable only with first-party server-side attribution joined to Stripe.

Half one tells you about consumption. Half two tells you about impact. People conflate them constantly, declaring victory because GPTBot fetched the file (half one) as if that proved revenue lift (half two). A fetch is not a citation, and a citation is not a click, and a click is not revenue. Four different things, four different measurements:

LayerWhat it meansWhere you measure itProves revenue?
FetchA bot retrieved /llms.txtServer access logsNo
CitationYour page appears in an AI answerCitation monitor / manual queryNo
ClickA human clicked through to your siteFirst-party AI-referrer attributionNo
RevenueThat click paid via StripeSession-to-Stripe webhook joinYes

Each layer drops a large fraction of the one above it. Most fetched files are never cited; most citations are never clicked (zero-click answers); most clicks do not convert. Declaring victory at the fetch layer is three leaps of faith stacked on top of each other.

How to actually measure llms.txt revenue impact: the before/after test

This is the part that earns the title. If you want to know whether llms.txt did anything for your revenue, here is the honest test design. It is not a randomized controlled trial. It is a before/after with the file as the intended single variable, and I will name the confounds rather than hide them.

The phases in detail:

PhaseDurationWhat you doWhat you record
0. InstrumentBefore anythingDeploy first-party server-side AI attribution joined to StripeBaseline tooling working
1. Baseline4-8 weeksChange nothing else; let AI traffic flowAI sessions, AI RPV, per-engine split
2. Intervention1 dayShip llms.txt (and llms-full.txt if a dev tool)Date stamp; nothing else changes
3. Measure8-12 weeksHold content/schema/links constantAI sessions, AI RPV, per-engine split
4. CompareAfterDelta vs baseline noise bandIs the change outside normal variance?

The metric that matters is AI-attributed revenue per visitor (RPV), not raw citations and not raw sessions. Citations are a vanity proxy; sessions can rise from unrelated content; revenue per visitor from AI engines is the thing that pays for the work. If you only have a citation-monitoring tool, you are measuring the wrong layer.

Now the confounds, stated plainly so you do not fool yourself:

ConfoundEffectMitigation
AI citation behavior drifts on its ownRPV moves with no cause from youLong baseline; watch a control set of pages you did not touch
You shipped other changesCannot isolate llms.txtFreeze content/schema/links during the test
SeasonalityRPV has a natural cycleCompare year-over-year if you can; note the season
Small sample (bootstrapped sites)Noise swamps signalExtend the window; do not over-read a 2-week blip
Engine model updatesStep-changes in retrievalNote any known model releases in the window

Because of those confounds, the strongest claim this test can support is "after shipping llms.txt, AI-attributed RPV moved outside the baseline noise band, with no other change I am aware of." That is a weak causal signal, honestly labeled. It is also infinitely better than "a vendor's dashboard says sites with llms.txt get cited more." One is your own revenue line under controlled conditions; the other is correlational marketing on someone else's aggregate.

The minimal stack to run this test, whether you build it or buy it:

ComponentJobBuild-it-yourselfBuy-it
AI-referrer detectionFingerprint ChatGPT/Perplexity/Claude/AIO referrers + behavioral inferenceEdge middleware + domain listIncluded
First-party session storePersist source server-side, cookielessYour DB + a session rowIncluded
Stripe webhook joinTie checkout.session.completed back to the sessionWebhook handler + metadataIncluded
Per-engine reportingSplit AI revenue by engine, before vs afterSQL + a dashboardIncluded

This is the test Attrifast's revenue attribution is built to run, and it is why I keep coming back to revenue rather than citations. The first-party AI-engine attribution gives you a clean baseline that GA4 cannot, the Stripe join turns sessions into dollars, and the before/after delta is something a founder can defend. You can run the same architecture yourself without Attrifast; the requirements are server-side AI-referrer detection, a first-party session store, and a Stripe webhook join. The tool just removes the build time.

llms.txt tool comparison: what to pay for and what to hand-write

There is a small market of tools around llms.txt. Most of them are solving a problem that does not need a recurring subscription. Here is the honest breakdown.

Tool / approachWhat it doesCostWorth it?
Hand-write the fileYou write 30 lines of markdownFree + 30 minYes, for almost everyone
Mintlify (if you host docs there)Auto-generates llms.txt + llms-full.txtBundled with docs hostingYes, you already have it
Static-site generator pluginBuilds llms.txt from your content at buildFree (OSS)Yes, for docs/large sites
"llms.txt automation" SaaSGenerates + monitors the file$49-99+/moRarely; it is 30 lines
Citation-monitoring tools (Profound, etc.)Track brand mentions in AI answers$99-499+/moFor citation tracking, not llms.txt
First-party revenue attribution (Attrifast)Measures AI-attributed revenue before/after$29/moFor the impact question, yes

The category confusion here mirrors the broader GEO market. Some tools generate the file (mostly unnecessary), some monitor whether the file exists or changed (almost entirely unnecessary), some monitor your brand citations (a different and legitimate job), and exactly one category answers "did it move revenue" (first-party attribution joined to payments). Buy for the job:

Job to be doneRight tool
"Write my llms.txt"Your text editor (or Mintlify if hosted there)
"Keep llms.txt updated as docs change"A build-time generator, not a SaaS
"Am I cited in AI answers?"A citation monitor (Profound, Loamly, SEOcrawl)
"Did llms.txt move my revenue?"First-party AI revenue attribution
"Is anything even fetching my llms.txt?"Server logs / log analytics

I want to be fair to the citation-monitoring vendors here. Profound, SEOcrawl, and Loamly do a real thing: they watch AI answers and tell you when your brand shows up. That is genuinely useful for tracking presence. The critique is narrower: when those tools imply that llms.txt is why your citations rose, they are making a Tier 3 claim with Tier 1 confidence, because they are not running a holdout and they cannot isolate the file. Use them for what they measure (presence), not for causal claims about a single tactic.

What I actually recommend for llms.txt in 2026

Pulling the skepticism into a concrete recommendation, because "it's unproven" is not actionable on its own. Here is what I do and tell clients to do.

DecisionRecommendationConfidence
Ship an llms.txt at all?Yes (30 min, near-zero downside)High
Ship llms-full.txt?Only if a dev tool / docs-heavy productHigh
Pay a SaaS to generate it?NoHigh
Expect a citation lift from it?No, do not bank on itHigh
Use it as your only GEO move?No; it is the cheapest, not the strongestHigh
Measure its revenue impact?Yes, with a before/after testHigh
Believe a vendor's "+N% citations" claim?No, unless they ran a holdoutHigh
Update it when your best pages change?Yes, quarterlyMedium

The ranking against other GEO tactics matters. In the GEO tactics playbook and the how-to-get-cited piece, the high-confidence moves are structured data, a Direct Answer block, entity disambiguation, and one canonical URL per concept. llms.txt sits below all of those in expected impact for a marketing site, and above them only in ease. Spend your first GEO hours on schema and entity work. Spend your seventh hour on llms.txt. Do not invert that order because a vendor made llms.txt sound like the headline.

GEO tacticExpected impact (marketing site)EffortEvidence quality
FAQ + Article schemaHighLow-mediumMultiple studies
Direct Answer blockHighLowMultiple studies
Entity disambiguation (sameAs)HighMediumEntity-SEO research
One canonical URL per conceptMediumMediumSEO fundamentals
Inline primary-source citationsMediumLowGEO research
llms.txtLow (dev tools: high)LowMinimal/anecdotal
Measure AI revenue (attribution)Enables all decisionsLowMechanical

The bottom row is the meta-point. None of the tactics above mean anything if you cannot see which ones moved revenue, and AI revenue is exactly the thing GA4 hides. That is the gap Attrifast fills, and it is why I would rather a founder spend $29/mo measuring than $99/mo generating a file they could write in half an hour.

How llms.txt fits the broader AEO/GEO picture

llms.txt is one tile in a larger mosaic, and over-indexing on it is a symptom of treating GEO as a checklist of files rather than a content-and-measurement discipline. The bigger questions are: how do answer engines decide what to cite, where do they get their information, and how do you know any of it earned revenue.

On the first two, where Google AI gets its information walks the retrieval side for the highest-volume AI surface, and the short version is that it leans on Google's existing index and trust signals far more than on any new file you can add. That is consistent with Mueller saying Google ignores llms.txt: Google already has your sitemap, your schema, and its own crawl. A curated markdown file adds little to a system that already enumerates and trusts at scale.

On the strategic split, AEO vs SEO in 2026 frames why answer-engine optimization is mostly classic SEO discipline plus structured extraction help, not a parallel universe of new files. llms.txt is best understood inside that frame: a small, optional extraction aid, not a new channel.

And on measurement, which is the through-line of everything I write: the reason I can be calm about llms.txt being unproven is that I do not need it to be proven to make a decision. I ship it because it is cheap, I instrument the revenue line, and I let the data decide. If a future model update makes llms.txt suddenly matter for consumer chat, my attribution will show the RPV move and I will lean in. If it never matters, I have lost 30 minutes. That posture, cheap bets plus honest measurement, beats both the hype and the cynicism. Track the AI surfaces that actually drive revenue (ChatGPT, AI Overviews) and let the file be the small thing it is.

Limitations

Things this article deliberately does not claim, and where you should not extrapolate.

  • I cannot prove llms.txt does nothing, either. Absence of public evidence for citation lift is not proof of no effect, especially for training-corpus representation, which is genuinely unmeasurable from outside the labs. My position is "unproven and probably small for marketing sites," not "definitively useless."
  • The engine-consumption table is a mid-2026 snapshot. AI vendors change behavior monthly and rarely announce it. Google could adopt llms.txt next quarter; OpenAI could document inference-time use. Re-verify per engine before relying on any row.
  • Adoption percentages are from my own sampling, roughly 300 public sites, skewed toward SaaS and developer tools. They are directional, not a census. Treat them as "roughly this order of magnitude."
  • The before/after test is not an RCT. It cannot isolate llms.txt with certainty because AI citation behavior drifts independently. It produces a weak causal signal at best. I label it that way on purpose.
  • The developer-tool case is the strong one and I am bullish on it. If you ship docs and an IDE agent fetches your llms-full.txt to answer a developer in-context, that is real, direct value. My skepticism is specifically about the marketing-site, consumer-chat citation claim, not about the docs use case.

FAQ

What is llms.txt and what does it actually do?

llms.txt is a plain-text, markdown-formatted file you place at your domain root (yoursite.com/llms.txt) that lists your most important pages with one-line descriptions, so a large language model can find a curated map of your site instead of crawling everything. It was proposed by Jeremy Howard of Answer.AI in September 2024 and the spec lives at llmstxt.org. Critically, it is not an enforced standard the way robots.txt is a crawl directive. No major AI engine has publicly confirmed it uses llms.txt as a ranking or retrieval signal at inference time as of mid-2026. It is a curation convention that some documentation platforms and crawlers read, not a guaranteed visibility lever.

Does llms.txt actually work, or improve AI citations?

Honest answer: there is little hard public evidence that llms.txt changes how often you get cited in ChatGPT, Perplexity, Claude, or Google AI Overviews as of mid-2026. Google's John Mueller has publicly said Google does not use llms.txt, and OpenAI and Anthropic have not documented inference-time consumption of it. The strongest real-world signal is that documentation platforms like Mintlify auto-generate it and some developer-tool crawlers fetch it. Most claims that llms.txt boosts citations are vendor marketing without a controlled before/after measurement attached. The file costs about 30 minutes to write and carries near-zero downside, so it is a reasonable speculative bet, but treat citation-lift promises with skepticism until you measure your own revenue delta.

Is llms.txt the same as robots.txt or sitemap.xml?

No. robots.txt tells crawlers what they may and may not fetch and is widely respected. sitemap.xml gives search engines a machine-readable list of every URL for indexing. llms.txt is different in intent: it is a curated, human-written, markdown summary of your most LLM-relevant pages plus context, designed to help a model build a useful mental map cheaply. robots.txt restricts, sitemap.xml enumerates everything, llms.txt curates and explains a subset. robots.txt and sitemap.xml are enforced and consumed by named crawlers; llms.txt consumption is informal and uneven.

Should I add llms.txt to my SaaS or ecommerce site in 2026?

For most SaaS and developer-tool sites, yes, because the cost is roughly 30 minutes and the downside is essentially zero. It is most useful when your most valuable pages are not your most-linked pages, which is common for docs, methodology pages, and pricing. For ecommerce with thousands of SKUs, a hand-curated llms.txt is less useful than a clean sitemap and good product schema; list your category and guide pages instead of every product. The one thing I would not do is pay a recurring SaaS fee to auto-generate a 40-line markdown file. Write it once, review it quarterly, and instrument the revenue line so you actually know if it did anything.

How do I measure whether llms.txt improved my revenue?

You cannot measure it in default GA4, because AI-engine clicks land in the Direct/(none) bucket when the referer is stripped. The measurable approach is a before/after revenue test: capture a clean baseline of AI-attributed sessions and revenue for 4-8 weeks using first-party server-side attribution that fingerprints AI-engine referrers and joins sessions to Stripe webhooks, then ship llms.txt and hold everything else constant, then compare AI-attributed revenue per visitor over the next 8-12 weeks. The confound is that AI citation behavior drifts on its own, so the test is directional, not a randomized trial. But measuring your own revenue line beats trusting a vendor's aggregate citation chart.

Which AI engines and tools actually read llms.txt today?

As of mid-2026 the picture is uneven and changes monthly. Documentation platforms (Mintlify, some Docusaurus and GitBook setups) auto-generate llms.txt and llms-full.txt, and several developer-tool and coding-assistant crawlers fetch them. Anthropic publishes an llms.txt for its own docs. Google has publicly stated it does not use llms.txt for Search or Gemini. OpenAI has not documented inference-time use. The safest read is that llms.txt is most reliably consumed inside developer-documentation ecosystems and IDE assistants, and least reliably consumed by the consumer chat surfaces most marketers care about. Verify per-engine rather than assuming universal support.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a curated map: an H1, a summary blockquote, and sectioned lists of links with one-line descriptions, typically 1-5 KB. llms-full.txt is the same idea taken further: it inlines the entire markdown content of those pages into one large file so an agent can ingest your whole relevant corpus in a single fetch with no crawling, often 50 KB to over 1 MB. The full file is most valuable for developer tools and docs, where a coding agent ingesting your complete documentation in-context produces a direct, observable improvement in its answers about your product. For a general marketing site, llms-full.txt is usually overkill.

Does Google use llms.txt for AI Overviews or Gemini?

No, per public statements from Google's John Mueller, Google does not use llms.txt. Google AI Overviews and Gemini draw on Google's existing index, sitemaps, schema, and trust signals rather than a separate llms.txt. This matters because AI Overviews are the highest-volume AI surface most businesses encounter, so the absence of Google consumption substantially weakens any general "llms.txt boosts AI visibility" claim aimed at marketers. If your AI-citation goals center on Google surfaces, your effort is better spent on schema, content quality, and the signals Google already consumes.

Will adding llms.txt hurt my SEO or get me penalized?

No. llms.txt is a separate file that does not interact with your normal SEO. It does not change your robots.txt rules, your sitemap, your canonical tags, or your rendered HTML. Search engines that do not consume it simply ignore it, the same as they ignore humans.txt. There is no documented penalty mechanism. The only real cost is the 30 minutes to author it and the discipline to keep it accurate; a stale llms.txt that points at dead or moved pages is mildly counterproductive for any agent that does read it, which is why a quarterly review is worth scheduling.

How do I know if anything is actually reading my llms.txt?

Check your server access logs, not GA4. GA4 is a client-side tag that bots do not execute, so it never sees a crawler fetch. In your raw logs, grep for requests to /llms.txt and /llms-full.txt, then look at the user-agents: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, and various IDE-assistant agents. A fetch tells you the file was retrieved; it does not tell you the content was used in an answer or that it produced a citation or a click. Consumption, citation, click, and revenue are four separate things measured four separate ways, so do not read a log hit as proof of revenue lift.

Should I pay for an llms.txt generator or monitoring tool?

For generation, almost never: the file is roughly 30 lines of markdown you can write in a text editor, or auto-generate for free with a static-site plugin or your docs host (Mintlify already does it). For monitoring whether the file exists or changed, also no, that is trivial to check yourself. Where paying makes sense is a different job entirely: citation-monitoring tools that track whether your brand appears in AI answers, and first-party revenue-attribution tools that measure whether AI engines actually drove paid conversions. Buy for the job to be done. Do not pay a recurring fee for a file you could have written once over a coffee.

Is llms.txt going to become a real standard like robots.txt?

Possibly, but it is not there yet. As of mid-2026 it is an informal convention hosted at llmstxt.org, not an IETF RFC, and the relevant standards work (the IETF effort around AI crawler preferences) is more focused on access and usage signals than on curated content maps. For llms.txt to become a robots.txt-class standard, the major AI engines would need to publicly commit to consuming it, which most have not done and Google has declined. It could happen if adoption and pressure grow. Until a major consumer-chat engine documents that it uses the file at inference time, treat llms.txt as a useful convention with an uncertain future, not a settled standard.

What is the single best use of my time if I am skeptical of llms.txt?

Instrument your AI-attributed revenue first, then ship llms.txt as a cheap bet and watch what the data does. The reason to invert the usual advice (which says "ship the file, then maybe measure") is that measurement is the only thing that converts the entire GEO debate from opinion into evidence on your own site. Once you can see AI-attributed revenue per visitor by engine, every tactic, llms.txt included, becomes a testable hypothesis instead of a vendor talking point. That is a far better use of an afternoon than reading another blog post (this one included) arguing about whether a 30-line markdown file is magic.

Related reading from the Attrifast research stack

For more on connected topics, see ChatGPT Query Fan-Out, Explained for Attribution Operators (2026), Is llms.txt Worth It? A 10-Site, 6-Week Controlled Experiment (2026 Data), AI Brand Sentiment in 2026, and What Does Google Know About Me? (2026 Inventory).

References

  1. llms.txt specification, llmstxt.org. https://llmstxt.org/
  2. Jeremy Howard, The /llms.txt file proposal, Answer.AI. https://www.answer.ai/posts/2024-09-03-llmstxt.html
  3. AnswerDotAI, llms-txt repository and ecosystem, GitHub. https://github.com/AnswerDotAI/llms-txt
  4. Anthropic, Claude documentation (publishes an llms.txt for docs). https://docs.anthropic.com/
  5. OpenAI, Overview of OpenAI's bots and how to control them. https://platform.openai.com/docs/bots
  6. OpenAI, Introducing ChatGPT search. https://openai.com/index/introducing-chatgpt-search/
  7. Vercel, Guidance and patterns for llms.txt. https://vercel.com/blog
  8. Mintlify, llms.txt and llms-full.txt for hosted documentation. https://mintlify.com/docs/settings/llms-txt
  9. AnswerDotAI, Directory and examples of sites using llms.txt, GitHub. https://github.com/AnswerDotAI/llms-txt
  10. Search Engine Land, Google does not use llms.txt (John Mueller). https://searchengineland.com/library/google/google-search
  11. John Mueller / Google Search Central, Search Central guidance and statements. https://developers.google.com/search/docs
  12. Sitemaps.org, Sitemaps XML protocol. https://www.sitemaps.org/protocol.html
  13. IETF, AI Preferences (aipref) Working Group. https://datatracker.ietf.org/wg/aipref/about/
  14. Google Developers, Google-Extended and Google crawlers overview. https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
  15. Anthropic, Does Anthropic crawl data from the web, and how can site owners block the crawler? https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
  16. Schema.org, Article specification. https://schema.org/Article
  17. Cloudflare, AI crawler and bot traffic insights, Cloudflare Radar. https://radar.cloudflare.com/ai-insights
  18. Profound, AI search and citation monitoring research. https://www.tryprofound.com/
  19. Backlinko, Generative engine optimization and AI search research. https://backlinko.com/
  20. MDN Web Docs, Referer header reference. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer

For the broader GEO strategy this tactic fits inside, the GEO tactics playbook for 2026 and how to get cited by AI engines are the companions. For the strategic SEO-vs-AEO framing, see AEO vs SEO in 2026. For the measurement layer that lets you actually test whether llms.txt or any GEO move drove revenue, Attrifast's revenue attribution joins AI-engine sessions to Stripe server-side, with the surface-specific guides for tracking ChatGPT traffic, tracking AI Overviews, logging AI crawlers, and understanding where Google AI gets its information.

Related reading

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

5-day free trial · $29/mo · cancel anytime