Blog / GEO Strategy

llms.txt Explained: Does It Actually Improve AI Visibility and Revenue in 2026?

21 min readUpdated May 2026

Vincent RuanFounder, Attrifast · May 26, 2026 · 21 min read

A skeptical 2026 deep-dive on llms.txt: what the spec actually is, who reads it, whether it changes AI citations, and how to measure the revenue lift yourself instead of trusting vendor hype.

Part of the generative engine optimization guide and AEO Hub.

TL;DR

llms.txt is a curated, markdown-formatted file at your domain root that summarizes your most LLM-relevant pages. It was proposed by Jeremy Howard of Answer.AI in September 2024, and the spec lives at llmstxt.org. It is a convention, not an enforced standard.
The honest 2026 position: there is little hard public evidence that llms.txt changes citation rates on the consumer chat surfaces (ChatGPT, Perplexity, Claude, Google AI Overviews). Google's John Mueller has stated Google does not use it. Most "llms.txt boosts visibility" claims are vendor marketing without a controlled measurement.
Where it is genuinely consumed today is developer-documentation ecosystems: Mintlify auto-generates it, Anthropic ships one for its docs, and IDE coding assistants fetch it. That is a real but narrow consumption surface.
The file costs roughly 30 minutes to write and carries near-zero downside. So shipping it is a reasonable speculative bet. Paying a recurring SaaS fee to generate 40 lines of markdown is not.
The only defensible way to know if llms.txt helped you is a before/after revenue test using first-party AI-engine attribution joined to Stripe, not a vendor citation dashboard and not GA4 (which buckets AI clicks as Direct).
Stop trusting the hype. Measure the actual revenue lift after you adopt llms.txt: Attrifast's cookieless AI-revenue attribution → Start free trial

I shipped an llms.txt on attrifast.com in October 2024, about three weeks after Jeremy Howard floated the idea. I wanted to be early. I also wanted to believe it would do something, because the GEO space was loud with people telling me it was the next robots.txt and I am as susceptible to a clean narrative as anyone. Eighteen months later I have a confession that does not show up in many vendor decks: I still cannot prove the file moved a single citation or a single dollar on my own site. Not because it didn't, but because the honest measurement is genuinely hard, and almost nobody selling you llms.txt tooling has done it.

This article is the piece I wish I had read in late 2024. It explains what llms.txt actually is, where the spec came from, who really reads it versus who claims to, what the format looks like line by line, and why the citation-lift claims you have seen are mostly unfalsifiable marketing. Then it does the thing the rest of the genre skips: it walks through how to measure the revenue impact on your own site with a before/after test, because that is the only number that survives a board meeting. If you want the broader playbook, the GEO tactics playbook for 2026 sits alongside this; this piece is the deep, skeptical drill into one tactic.

How AI engines and documentation crawlers consume llms.txt: a curated markdown index at the domain root, read unevenly across consumer chat surfaces and developer-tool crawlers

Quick Facts

Item	Value	Source
Proposed by	Jeremy Howard, Answer.AI	Answer.AI / llmstxt.org [1][2]
Proposal date	September 2024	llmstxt.org changelog [1]
Spec home	llmstxt.org	llmstxt.org [1]
File location	`/llms.txt` at domain root	Spec [1]
File format	Markdown (CommonMark)	Spec [1]
Companion file	`llms-full.txt` (full content inlined)	Spec / Mintlify [1][8]
Does Google use it?	No (per John Mueller)	Search Engine Land / Mueller [10][11]
Anthropic ships one?	Yes, for its docs	Anthropic docs [4]
Mintlify auto-generates it?	Yes, for hosted docs	Mintlify docs [8]
Estimated adoption (public SaaS, Q1 2026)	~7-10%	Attrifast sampling, n=~300
Hard public evidence of citation lift	Minimal / anecdotal	This article's read of the literature
Cost to ship	~30 minutes, one-time	Author estimate
Recurring cost if hand-maintained	~15 min/quarter	Author estimate
Standardization status	Informal; not an IETF RFC	IETF AI-preferences WG scope [13]

Two of those rows matter more than the rest. "Does Google use it? No" is the row that should temper every citation-lift claim aimed at marketers, because Google AI Overviews are the highest-volume AI surface most businesses touch. And "Hard public evidence of citation lift: minimal" is the row this entire article is built around. The rest is detail.

What llms.txt actually is (and what it is not)

llms.txt is a single file, written in markdown, that you place at the root of your domain so that it resolves at https://yoursite.com/llms.txt. Inside it, you write a short description of what your site or product is, then list your most important pages as markdown links, each with a one-line description. That is the whole thing. The specification at llmstxt.org [1] is intentionally tiny, the kind of thing you can read in five minutes.

The proposal came from Jeremy Howard, co-founder of Answer.AI and previously fast.ai and Kaggle, in September 2024 [1][2]. His framing was practical and narrow: large language models have small effective context windows relative to a full website, and most websites are full of navigation, scripts, ads, and boilerplate that waste tokens and confuse extraction. A curated markdown file gives a model a clean, high-signal map of what matters on your site, expressed in the format models parse most cleanly. That is the entire pitch. It is a good pitch. It is also a much narrower pitch than the GEO industry has since wrapped around it.

Here is the distinction that gets lost. There are two kinds of "standard" on the web.

Type of standard	Examples	Enforcement	Who consumes it
Enforced, named-crawler standard	robots.txt, sitemap.xml, HTTP caching headers	Crawlers publicly commit to honoring it	Googlebot, Bingbot, GPTBot, named bots
Informal convention	llms.txt, humans.txt, security.txt (partly)	Voluntary; no public commitment from major consumers	Whoever chooses to read it

robots.txt sits in the first row. Google, Bing, OpenAI's GPTBot, and Anthropic's ClaudeBot all publicly document that they honor it [4][14]. sitemap.xml sits in the first row too; it is part of the sitemaps.org protocol [12] that search engines consume by contract. llms.txt sits squarely in the second row. No major AI engine has published a commitment to consume it at inference time. That single fact reframes everything else in this article.

So what llms.txt is not:

It is not a ranking signal. There is no documented mechanism by which adding llms.txt raises your position in any AI answer.
It is not a crawl directive. It does not grant or deny access; that is robots.txt's job.
It is not an index Google or Gemini reads. John Mueller of Google has said so directly [10][11].
It is not required for AI visibility. Plenty of heavily-cited pages have no llms.txt at all.
It is not verified. There is no console that tells you "your llms.txt was read and used."

Put as a myth-vs-reality table, since these confusions recur in every GEO thread:

Common myth	Reality
"It's robots.txt for AI"	Different job: curation, not access control
"Adding it ranks me in ChatGPT"	No documented ranking mechanism
"Google reads it for AI Overviews"	Mueller: Google does not use it
"Every AI engine consumes it"	Most consumer surfaces undocumented
"I need it to be cited at all"	Many cited pages have none
"A SaaS must generate it"	It's 30 lines of markdown you can write by hand
"A crawl of the file proves it worked"	A fetch is not a citation is not a click is not revenue

What it is: a low-cost, plausibly-useful curation convention, most reliably consumed inside developer-documentation tooling, that summarizes your best pages in the format models like. That is a real thing worth doing. It is just not the thing the louder vendors are selling.

llms.txt vs robots.txt vs sitemap.xml: the comparison that clears up most confusion

The single most common misconception I field is that llms.txt is "robots.txt for AI." It is not. They solve different problems, are consumed differently, and you should run all three. Here is the full breakdown.

Dimension	robots.txt	sitemap.xml	llms.txt
Primary purpose	Allow/deny crawler access	Enumerate all indexable URLs	Curate + explain key pages for LLMs
Format	Plain text, directive syntax	XML	Markdown
Location	`/robots.txt`	`/sitemap.xml` (or linked)	`/llms.txt`
Curated or exhaustive?	Rules, not a list	Exhaustive list	Curated subset
Human-readable prose?	No	No	Yes (descriptions encouraged)
Enforced by major crawlers?	Yes	Yes	No public commitment
Standardized body?	De facto + RFC 9309	sitemaps.org protocol	Informal (llmstxt.org)
Google honors it?	Yes	Yes	No (per Mueller)
OpenAI GPTBot honors it?	Yes (robots rules)	Reads sitemaps	Not documented
Anthropic ClaudeBot honors it?	Yes (robots rules)	Reads sitemaps	Ships own; consumption undocumented
Typical size	<1 KB	KB to MB	1-5 KB (llms-full.txt larger)
Risk of misconfiguration	High (can deindex you)	Low	Near zero
Time to author	10 min	Auto-generated	~30 min

The risk column is the practical one. A botched robots.txt can deindex your entire site, which is why people are rightly cautious with it. llms.txt cannot hurt you that way. The worst case for llms.txt is that nothing reads it, which is the same outcome as not having one, minus 30 minutes. That asymmetry is exactly why I recommend shipping it even though I am skeptical of the citation claims. Low downside plus uncertain upside equals "do it, but measure it."

A second framing that helps: think of the three files as restrict, enumerate, curate.

File	One-word job	What a model gets from it
robots.txt	Restrict	What it is allowed to fetch
sitemap.xml	Enumerate	Every URL that exists
llms.txt	Curate	Which URLs matter, and why

You want all three. They do not compete. The mistake is dropping sitemap.xml because you added llms.txt; an AI crawler that respects sitemaps still wants the exhaustive list for coverage, and the curated file for priority. Belt and suspenders.

Where llms.txt came from: the Answer.AI proposal and the early adopters

The origin story matters because it explains the gap between what llms.txt was designed for and what it is now marketed as. Jeremy Howard's original proposal [2] is explicitly framed around documentation and developer context. The motivating example in the spec is a software library: you want a coding assistant to understand your API quickly, so you give it a clean markdown map of your docs plus, optionally, an llms-full.txt that inlines the full content of those pages so the model never has to crawl at all.

That framing is why the strongest early adoption clustered in developer tooling, not marketing sites.

Adopter / platform	What they did	Surface	Source
Answer.AI	Authored the spec, hosts llmstxt.org	Spec home	[1][2]
Mintlify	Auto-generates llms.txt + llms-full.txt for hosted docs	Docs hosting	[8]
Anthropic	Publishes llms.txt for its documentation	Vendor docs	[4]
Cursor / IDE assistants	Fetch docs llms.txt for context	Coding assistants	community reports [9]
Various OSS libraries	Ship llms.txt in repos	GitHub	[9]
Vercel	Published guidance and patterns for llms.txt	Platform docs/blog	[7]

Notice the pattern: documentation platforms and coding assistants, where the consumer is an IDE or an agent that is explicitly fetching docs to answer a developer's question in-context. That is a real, working use case. When a coding assistant pulls your llms-full.txt to answer "how do I authenticate with this API," the file did its job, directly and measurably (the model gave a better answer about your product).

The leap that the GEO industry made was to assume the same mechanism applies to consumer chat. That a marketer's blog post about, say, "best CRM for plumbers" gets cited more often in ChatGPT because the site has an llms.txt. That leap is unproven. The consumer chat surfaces (ChatGPT, Perplexity, Claude.ai chat, Google AI Overviews) are powered by training corpora and live-search indexes whose retrieval logic is not documented to consult llms.txt. There is a meaningful difference between "an IDE agent deliberately fetches your docs map" and "a frontier model's retrieval pipeline happens to weight your llms.txt." The first is happening. The second is asserted, not shown.

There is also a community directory aspect worth knowing about. Public directories of sites that have shipped llms.txt have sprung up (you can find lists and the spec's own examples linked from the llms.txt GitHub ecosystem [3]). These are useful for seeing real examples, but a directory of adopters is not evidence of consumption by AI engines. It is evidence that adoption happened, which is a different and much weaker claim.

How AI engines consume (or ignore) llms.txt

Let me draw the actual data flow, because the hand-wavy version ("AI reads your llms.txt and cites you more") obscures where the chain actually breaks. There are two distinct paths, and they behave nothing alike.

The left branch (IDE agent) is the one that works and is the use case the spec was written for. The right branch (consumer chat) is the one marketers care about, and it is where the evidence evaporates. Let me be precise about each engine's documented posture as of mid-2026.

AI surface	Documented llms.txt consumption	Notes
Google Search / AI Overviews	No (Mueller stated Google does not use it)	Highest-volume AI surface; relies on its normal index [10][11]
Google Gemini	Not documented	Trains via Google-Extended; no llms.txt commitment [14]
OpenAI ChatGPT (chat)	Not documented	GPTBot crawls per robots.txt; llms.txt use undocumented [5][6]
OpenAI ChatGPT Search	Not documented	OAI-SearchBot indexes; no llms.txt commitment [5][6]
Anthropic Claude (chat)	Not documented for inference	Anthropic ships one for its own docs [4]
Perplexity	Not documented	PerplexityBot crawls; no public llms.txt commitment
Microsoft Copilot	Not documented	Bing-index-backed
IDE coding assistants (Cursor, etc.)	Yes, in practice	Explicitly fetch docs llms.txt [9]
Documentation search (Mintlify-hosted)	Yes	Generates and serves it [8]

Read that table honestly and the conclusion is uncomfortable for the GEO-tooling industry: the engines that drive the most marketing-relevant citations are exactly the ones with no documented llms.txt consumption, and the surfaces that do consume it are developer tooling. If your business is a developer tool with docs, llms.txt is a clear win. If your business is a plumber-CRM whose target citations live in ChatGPT and Google AI Overviews, llms.txt is a speculative bet on an undocumented mechanism.

The one nuance that keeps the bet from being zero: training crawlers (GPTBot, ClaudeBot, Google-Extended) do fetch files at your root, and an llms.txt that points them efficiently at your best pages could, plausibly, improve how well those pages are represented in a future training corpus. "Could plausibly" is the operative phrase. Nobody outside the labs can measure the weight, and the labs have not published it. This is the difference between a defensible hypothesis and a marketing claim, and the GEO industry routinely collapses the two.

The llms.txt format spec, line by line

The spec is small enough to cover in full. Here is the structure the llmstxt.org specification [1] defines, with each element's role.

Element	Required?	Markdown form	Purpose
H1 title	Required	`# Project Name`	The single H1; names the site/product
Blockquote summary	Recommended	`> One-paragraph summary`	High-signal description of what the site is
Free-form detail	Optional	Plain paragraphs	Extra context, caveats, how to use the site
H2 sections	Optional	`## Section`	Group links (Docs, Guides, Posts, Optional)
Link list items	Required (in sections)	`- [name](url): description`	The curated pages with one-line descriptions
"Optional" section	Special	`## Optional`	Pages a model may skip if context is tight

A few rules that trip people up:

There must be exactly one H1, and it must be first. Multiple H1s break the parse expectation.
The blockquote immediately after the H1 is the summary, and it carries a lot of the file's value because it is the one place you state plainly what you are.
Link list items follow the form - [Page name](absolute-url): one-line description. Use absolute URLs, not relative paths, so a model that lifts a link out of context still resolves it.
The ## Optional section is a signal to the model: if you are short on context budget, these pages are safe to drop. Use it for nice-to-haves.

There is also the companion llms-full.txt. Where llms.txt is a map (links plus descriptions), llms-full.txt inlines the entire markdown content of those pages into one file, so an agent can ingest your whole relevant corpus in a single fetch with zero crawling. This is the file that does real work in the IDE-assistant use case, and it is what Mintlify auto-generates [8]. It can get large (hundreds of KB for a big docs set), which is fine for a deliberate agent fetch and useless for a casual one.

File	Contains	Typical consumer	Typical size
`llms.txt`	Curated links + descriptions	Any crawler/agent	1-5 KB
`llms-full.txt`	Full inlined page content	Coding/docs agents	50 KB-1 MB+

A quick reference for writing the link descriptions well, since this is where the file earns or wastes its value:

Weak description	Stronger description	Why
"Our pricing page"	"Pricing: free plan and $15/mo Pro, what each tier includes"	Names the concept a user would ask about
"Blog post about AI"	"How AI engines decide what to cite, with limits"	Encodes the actual question the page answers
"Features"	"Revenue attribution by channel without cookies"	Specific capability, not a nav label
"Docs"	"API auth, quickstart, and SDK reference"	Tells the model what it will find

My own rule: ship llms.txt for everyone, and only ship llms-full.txt if you are a developer tool or docs-heavy product where an agent ingesting your full docs in one shot is a realistic, valuable event. For a marketing site, llms-full.txt is mostly cargo-culting.

A copy-pasteable llms.txt you can actually use

Here is a complete, working llms.txt for a SaaS, structured per the spec. This is close to what runs on attrifast.com, edited for clarity. Note this lives inside a fenced code block, so any characters are safe here; the format itself does not use angle brackets at all.

# Attrifast

> Attrifast is a privacy-first, Stripe-native revenue attribution tool for SMB SaaS and ecommerce. The cookieless 4kb script captures first-party sessions and joins them to Stripe webhook events server-side, so you can see revenue by channel, including AI engines like ChatGPT and Perplexity, without third-party cookies or a consent banner in most jurisdictions.

This file curates the pages most useful for understanding what Attrifast does, how the attribution architecture works, and how we think about measurement. Prices are USD. Founder: Vincent Ruan.

## Core product
- [Homepage](https://attrifast.com/): Product overview, positioning, and pricing ($15/mo).
- [Revenue attribution by channel](https://attrifast.com/features/revenue-attribution): How channel-level revenue attribution works without third-party cookies.
- [Track ChatGPT traffic](https://attrifast.com/track-chatgpt-traffic): Detecting and attributing ChatGPT referral sessions server-side.
- [Track AI Overviews](https://attrifast.com/track-ai-overviews): Measuring Google AI Overviews-driven sessions and revenue.

## Methodology and architecture
- [Return-delay penalty methodology](https://attrifast.com/methodology/return-delay-penalty): How we account for the lag between AI-cited click and paid conversion.
- [Cookieless architecture](https://attrifast.com/features/cookieless-revenue-analytics): The first-party, consent-light privacy design.

## Guides
- [How to get cited by AI engines](https://attrifast.com/blog/how-to-get-cited-by-ai-engines): The 7-step GEO playbook, with honest limits.
- [GEO tactics playbook 2026](https://attrifast.com/blog/geo-tactics-playbook-2026): The broader GEO strategy this site recommends.
- [AEO vs SEO in 2026](https://attrifast.com/blog/aeo-vs-seo-2026): How answer-engine and search-engine optimization diverge.

## Optional
- [About / founder](https://attrifast.com/about): Vincent Ruan's background and the entity page.
- [AI crawler tracking](https://attrifast.com/blog/ai-crawler-tracking-2026): Logging GPTBot, ClaudeBot, and friends.

That is the entire file. Roughly 30 lines, one H1, one blockquote summary, sectioned links with descriptions, an ## Optional block. It took me longer to decide which pages to include than to write the markdown. The hard part is editorial (what are my genuinely best pages?), not technical.

A note on the descriptions: write them for a model that has never seen your site and needs to decide, in one line, whether a page is relevant to a user's question. "How channel-level revenue attribution works without third-party cookies" is a better description than "Our revenue feature" because it contains the actual concepts a user might ask about. Treat each description as a tiny piece of retrieval bait, honestly written.

Does llms.txt work? The honest evidence audit

Now the section the title promises. I am going to lay out what we actually know, sorted by how strong the evidence is, because the GEO discourse mixes these tiers constantly.

Tier 1 — Documented, confirmable.

Claim	Status	Source
IDE/coding agents fetch docs llms.txt to answer in-context	Confirmed in practice	community + Mintlify [8][9]
Mintlify auto-generates llms.txt / llms-full.txt	Confirmed	Mintlify docs [8]
Anthropic ships an llms.txt for its docs	Confirmed	Anthropic docs [4]
Google does not use llms.txt for Search	Confirmed (Mueller)	Search Engine Land [10][11]
The spec exists and is stable	Confirmed	llmstxt.org [1]

Tier 2 — Plausible, unmeasured.

Claim	Status	Why uncertain
Training crawlers fetch llms.txt and it helps corpus representation	Plausible	No lab has published weight or effect
llms.txt at root makes a site cheaper to crawl efficiently	Plausible	Logical, but no public crawl-budget study for AI bots
Curated descriptions improve passage selection	Plausible	No controlled test isolating the file

Tier 3 — Asserted, unsupported.

Claim	Status	Reality
"Adding llms.txt boosts ChatGPT citations by N%"	Unsupported	No public controlled before/after with the file as the only variable
"llms.txt is a ranking signal"	Unsupported	No documented ranking mechanism
"Every AI engine reads llms.txt"	False	Most consumer surfaces undocumented; Google says no
"You need llms.txt to be cited"	False	Many cited pages lack one

The discourse problem is that Tier 3 claims are presented with Tier 1 confidence, usually in a sales context. When a tool tells you "sites with llms.txt get cited 30% more," ask the only question that matters: was llms.txt the only variable, and was there a holdout? The answer is almost always no. The "30% more" cohort also tends to have better content, better schema, more backlinks, and an active GEO team. llms.txt is correlated with being the kind of site that does everything right, which tells you nothing about llms.txt's marginal effect.

A short checklist for stress-testing any "llms.txt boosted citations" claim before you believe it:

Question to ask the vendor	Good answer	Red flag
Was llms.txt the only change?	"Yes, frozen everything else"	"We added it during a broader GEO push"
Was there a holdout / control?	"Yes, matched pages without it"	"No, just before/after on the whole site"
What metric moved?	"AI-attributed revenue per visitor"	"Citation count" or "visibility score"
Over what window?	"8-12 weeks, noise band stated"	"We saw a lift in week one"
Did a model update land in the window?	"None we're aware of"	"Not sure"
Can I reproduce it on my site?	"Here's the test design"	"Trust our aggregate"

This is the same trap I see in the broader question of whether GEO actually drives revenue: correlation between "did the GEO things" and "got cited" gets sold as causation for each individual tactic, when the only honest causal claim requires isolating the variable. Almost nobody does.

My own attrifast.com experience, stated as honestly as I can: I shipped llms.txt in October 2024. AI-attributed traffic to the site has grown a lot since then. I cannot attribute any of that growth to the llms.txt specifically, because in the same window I also shipped schema, Direct Answer blocks, entity disambiguation, and a couple dozen posts. The llms.txt is one of six things I changed. Isolating its effect would require a holdout I never ran. So when someone asks me "did llms.txt work for you," the truthful answer is "I don't know, and neither does anyone telling you it worked for them."

llms.txt adoption by site type: who should bother

Adoption is uneven by category, and so is the likely payoff. Here is my read, based on sampling roughly 300 public sites across categories in Q1 2026 plus the consumption logic above.

Site type	Adoption (sampled)	Likely payoff	Why
Developer tools / API products	~22%	High	Coding agents actually fetch docs llms.txt
Documentation-heavy SaaS	~18%	High	Same; docs are the cited surface
General B2B SaaS (marketing site)	~9%	Low-medium	Speculative; consumer-chat consumption undocumented
Content publishers (tech)	~11%	Low-medium	Training-crawl plausibility only
Ecommerce (catalog)	~3%	Low	Sitemap + product schema matter more
Local services	~1%	Very low	AI rarely owns the surface; no docs to map
Open-source libraries	~30%	High	The spec's home use case

The shape is consistent with everything above: the higher the payoff, the more it is a developer-documentation play. For a general B2B SaaS marketing site (the Attrifast reader), the right framing is "cheap insurance plus a plausible training-crawl benefit," not "a citation lever."

A second cut, by what the file should contain for each type:

Site type	What to list in llms.txt	What to skip
Developer tool	API docs, quickstart, auth, SDK pages; ship llms-full.txt	Marketing fluff
B2B SaaS	Core feature pages, methodology, pricing, top guides	Every blog post
Content publisher	Pillar pages, cornerstone explainers	Thin tag/archive pages
Ecommerce	Category pages, buying guides, sizing/spec references	Individual SKUs (use sitemap)
Local services	Service pages, service-area pages, FAQ	(Often skip llms.txt entirely)

The ecommerce row is the one people get wrong most. Do not enumerate 4,000 products in llms.txt. That is what sitemap.xml is for. Curate your category and guide pages, and let the catalog flow through the sitemap. The whole point of llms.txt is curation, and a 4,000-line "curated" file is a contradiction.

The measurement gap: why GA4 cannot tell you if llms.txt worked

Suppose you ship llms.txt and want to know if it did anything. Your instinct is to open GA4 and look at AI traffic. This does not work, for the same structural reason it does not work for any AI-engine measurement, and it is worth restating because it is the crux of the whole "does it work" question.

AI engines frequently strip the Referer header on outbound clicks. The ChatGPT clients, Perplexity in some configurations, Claude, and Google AI Overviews variously suppress the referer or open links in contexts where it is not passed. When the referer is empty and there is no UTM tag, GA4 has nothing to match against its channel rules, so the session lands in Direct / (none). I walk through the exact mechanics in the ChatGPT referral analytics guide, but the summary is brutal: in default GA4, the large majority of AI-engine clicks are invisible as AI, lumped into Direct alongside bookmarks and email-app clicks.

What you want to measure	Can GA4 show it?	Why not
AI-engine sessions	No	Referer stripped → Direct/(none)
AI-engine revenue	No	Channel misattributed; revenue follows
Per-engine split (ChatGPT vs Perplexity)	No	No built-in AI channel rules
Before/after llms.txt delta	No	Baseline itself is mis-bucketed
Crawl behavior (GPTBot fetching llms.txt)	No (GA4 is client-side)	Bots do not run your JS tag

The last row is important and often missed. GA4 runs in the browser. AI training crawlers and the bot that fetches your llms.txt do not execute your JavaScript tag, so GA4 never sees them at all. The only place a GPTBot fetch of /llms.txt shows up is your server logs. If you want to know whether anything is even reading your llms.txt, the first move is to grep your access logs, not open GA4. I cover the crawler-logging side in AI crawler tracking for 2026.

So the measurement problem splits into two halves:

Is anything fetching my llms.txt? Answerable from server logs (look for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and IDE-agent user-agents hitting /llms.txt and /llms-full.txt).
Did adopting it change my AI-attributed revenue? Not answerable in GA4. Answerable only with first-party server-side attribution joined to Stripe.

Half one tells you about consumption. Half two tells you about impact. People conflate them constantly, declaring victory because GPTBot fetched the file (half one) as if that proved revenue lift (half two). A fetch is not a citation, and a citation is not a click, and a click is not revenue. Four different things, four different measurements:

Layer	What it means	Where you measure it	Proves revenue?
Fetch	A bot retrieved /llms.txt	Server access logs	No
Citation	Your page appears in an AI answer	Citation monitor / manual query	No
Click	A human clicked through to your site	First-party AI-referrer attribution	No
Revenue	That click paid via Stripe	Session-to-Stripe webhook join	Yes

Each layer drops a large fraction of the one above it. Most fetched files are never cited; most citations are never clicked (zero-click answers); most clicks do not convert. Declaring victory at the fetch layer is three leaps of faith stacked on top of each other.

How to actually measure llms.txt revenue impact: the before/after test

This is the part that earns the title. If you want to know whether llms.txt did anything for your revenue, here is the honest test design. It is not a randomized controlled trial. It is a before/after with the file as the intended single variable, and I will name the confounds rather than hide them.

The phases in detail:

Phase	Duration	What you do	What you record
0. Instrument	Before anything	Deploy first-party server-side AI attribution joined to Stripe	Baseline tooling working
1. Baseline	4-8 weeks	Change nothing else; let AI traffic flow	AI sessions, AI RPV, per-engine split
2. Intervention	1 day	Ship llms.txt (and llms-full.txt if a dev tool)	Date stamp; nothing else changes
3. Measure	8-12 weeks	Hold content/schema/links constant	AI sessions, AI RPV, per-engine split
4. Compare	After	Delta vs baseline noise band	Is the change outside normal variance?

The metric that matters is AI-attributed revenue per visitor (RPV), not raw citations and not raw sessions. Citations are a vanity proxy; sessions can rise from unrelated content; revenue per visitor from AI engines is the thing that pays for the work. If you only have a citation-monitoring tool, you are measuring the wrong layer.

Now the confounds, stated plainly so you do not fool yourself:

Confound	Effect	Mitigation
AI citation behavior drifts on its own	RPV moves with no cause from you	Long baseline; watch a control set of pages you did not touch
You shipped other changes	Cannot isolate llms.txt	Freeze content/schema/links during the test
Seasonality	RPV has a natural cycle	Compare year-over-year if you can; note the season
Small sample (bootstrapped sites)	Noise swamps signal	Extend the window; do not over-read a 2-week blip
Engine model updates	Step-changes in retrieval	Note any known model releases in the window

Because of those confounds, the strongest claim this test can support is "after shipping llms.txt, AI-attributed RPV moved outside the baseline noise band, with no other change I am aware of." That is a weak causal signal, honestly labeled. It is also infinitely better than "a vendor's dashboard says sites with llms.txt get cited more." One is your own revenue line under controlled conditions; the other is correlational marketing on someone else's aggregate.

The minimal stack to run this test, whether you build it or buy it:

Component	Job	Build-it-yourself	Buy-it
AI-referrer detection	Fingerprint ChatGPT/Perplexity/Claude/AIO referrers + behavioral inference	Edge middleware + domain list	Included
First-party session store	Persist source server-side, cookieless	Your DB + a session row	Included
Stripe webhook join	Tie `checkout.session.completed` back to the session	Webhook handler + metadata	Included
Per-engine reporting	Split AI revenue by engine, before vs after	SQL + a dashboard	Included

This is the test Attrifast's revenue attribution is built to run, and it is why I keep coming back to revenue rather than citations. The first-party AI-engine attribution gives you a clean baseline that GA4 cannot, the Stripe join turns sessions into dollars, and the before/after delta is something a founder can defend. You can run the same architecture yourself without Attrifast; the requirements are server-side AI-referrer detection, a first-party session store, and a Stripe webhook join. The tool just removes the build time.

llms.txt tool comparison: what to pay for and what to hand-write

There is a small market of tools around llms.txt. Most of them are solving a problem that does not need a recurring subscription. Here is the honest breakdown.

Tool / approach	What it does	Cost	Worth it?
Hand-write the file	You write 30 lines of markdown	Free + 30 min	Yes, for almost everyone
Mintlify (if you host docs there)	Auto-generates llms.txt + llms-full.txt	Bundled with docs hosting	Yes, you already have it
Static-site generator plugin	Builds llms.txt from your content at build	Free (OSS)	Yes, for docs/large sites
"llms.txt automation" SaaS	Generates + monitors the file	$49-99+/mo	Rarely; it is 30 lines
Citation-monitoring tools (Profound, etc.)	Track brand mentions in AI answers	$99-499+/mo	For citation tracking, not llms.txt
First-party revenue attribution (Attrifast)	Measures AI-attributed revenue before/after	$15/mo	For the impact question, yes

The category confusion here mirrors the broader GEO market. Some tools generate the file (mostly unnecessary), some monitor whether the file exists or changed (almost entirely unnecessary), some monitor your brand citations (a different and legitimate job), and exactly one category answers "did it move revenue" (first-party attribution joined to payments). Buy for the job:

Job to be done	Right tool
"Write my llms.txt"	Your text editor (or Mintlify if hosted there)
"Keep llms.txt updated as docs change"	A build-time generator, not a SaaS
"Am I cited in AI answers?"	A citation monitor (Profound, Loamly, SEOcrawl)
"Did llms.txt move my revenue?"	First-party AI revenue attribution
"Is anything even fetching my llms.txt?"	Server logs / log analytics

I want to be fair to the citation-monitoring vendors here. Profound, SEOcrawl, and Loamly do a real thing: they watch AI answers and tell you when your brand shows up. That is genuinely useful for tracking presence. The critique is narrower: when those tools imply that llms.txt is why your citations rose, they are making a Tier 3 claim with Tier 1 confidence, because they are not running a holdout and they cannot isolate the file. Use them for what they measure (presence), not for causal claims about a single tactic.

What I actually recommend for llms.txt in 2026

Pulling the skepticism into a concrete recommendation, because "it's unproven" is not actionable on its own. Here is what I do and tell clients to do.

Decision	Recommendation	Confidence
Ship an llms.txt at all?	Yes (30 min, near-zero downside)	High
Ship llms-full.txt?	Only if a dev tool / docs-heavy product	High
Pay a SaaS to generate it?	No	High
Expect a citation lift from it?	No, do not bank on it	High
Use it as your only GEO move?	No; it is the cheapest, not the strongest	High
Measure its revenue impact?	Yes, with a before/after test	High
Believe a vendor's "+N% citations" claim?	No, unless they ran a holdout	High
Update it when your best pages change?	Yes, quarterly	Medium

The ranking against other GEO tactics matters. In the GEO tactics playbook and the how-to-get-cited piece, the high-confidence moves are structured data, a Direct Answer block, entity disambiguation, and one canonical URL per concept. llms.txt sits below all of those in expected impact for a marketing site, and above them only in ease. Spend your first GEO hours on schema and entity work. Spend your seventh hour on llms.txt. Do not invert that order because a vendor made llms.txt sound like the headline.

GEO tactic	Expected impact (marketing site)	Effort	Evidence quality
FAQ + Article schema	High	Low-medium	Multiple studies
Direct Answer block	High	Low	Multiple studies
Entity disambiguation (sameAs)	High	Medium	Entity-SEO research
One canonical URL per concept	Medium	Medium	SEO fundamentals
Inline primary-source citations	Medium	Low	GEO research
llms.txt	Low (dev tools: high)	Low	Minimal/anecdotal
Measure AI revenue (attribution)	Enables all decisions	Low	Mechanical

The bottom row is the meta-point. None of the tactics above mean anything if you cannot see which ones moved revenue, and AI revenue is exactly the thing GA4 hides. That is the gap Attrifast fills, and it is why I would rather a founder spend $15/mo measuring than $99/mo generating a file they could write in half an hour.

How llms.txt fits the broader AEO/GEO picture

llms.txt is one tile in a larger mosaic, and over-indexing on it is a symptom of treating GEO as a checklist of files rather than a content-and-measurement discipline. The bigger questions are: how do answer engines decide what to cite, where do they get their information, and how do you know any of it earned revenue.

On the first two, where Google AI gets its information walks the retrieval side for the highest-volume AI surface, and the short version is that it leans on Google's existing index and trust signals far more than on any new file you can add. That is consistent with Mueller saying Google ignores llms.txt: Google already has your sitemap, your schema, and its own crawl. A curated markdown file adds little to a system that already enumerates and trusts at scale.

On the strategic split, AEO vs SEO in 2026 frames why answer-engine optimization is mostly classic SEO discipline plus structured extraction help, not a parallel universe of new files. llms.txt is best understood inside that frame: a small, optional extraction aid, not a new channel.

And on measurement, which is the through-line of everything I write: the reason I can be calm about llms.txt being unproven is that I do not need it to be proven to make a decision. I ship it because it is cheap, I instrument the revenue line, and I let the data decide. If a future model update makes llms.txt suddenly matter for consumer chat, my attribution will show the RPV move and I will lean in. If it never matters, I have lost 30 minutes. That posture, cheap bets plus honest measurement, beats both the hype and the cynicism. Track the AI surfaces that actually drive revenue (ChatGPT, AI Overviews) and let the file be the small thing it is.

Limitations

Things this article deliberately does not claim, and where you should not extrapolate.

I cannot prove llms.txt does nothing, either. Absence of public evidence for citation lift is not proof of no effect, especially for training-corpus representation, which is genuinely unmeasurable from outside the labs. My position is "unproven and probably small for marketing sites," not "definitively useless."
The engine-consumption table is a mid-2026 snapshot. AI vendors change behavior monthly and rarely announce it. Google could adopt llms.txt next quarter; OpenAI could document inference-time use. Re-verify per engine before relying on any row.
Adoption percentages are from my own sampling, roughly 300 public sites, skewed toward SaaS and developer tools. They are directional, not a census. Treat them as "roughly this order of magnitude."
The before/after test is not an RCT. It cannot isolate llms.txt with certainty because AI citation behavior drifts independently. It produces a weak causal signal at best. I label it that way on purpose.
The developer-tool case is the strong one and I am bullish on it. If you ship docs and an IDE agent fetches your llms-full.txt to answer a developer in-context, that is real, direct value. My skepticism is specifically about the marketing-site, consumer-chat citation claim, not about the docs use case.

FAQ

What is llms.txt and what does it actually do?

llms.txt is a plain-text, markdown-formatted file you place at your domain root (yoursite.com/llms.txt) that lists your most important pages with one-line descriptions, so a large language model can find a curated map of your site instead of crawling everything. It was proposed by Jeremy Howard of Answer.AI in September 2024 and the spec lives at llmstxt.org. Critically, it is not an enforced standard the way robots.txt is a crawl directive. No major AI engine has publicly confirmed it uses llms.txt as a ranking or retrieval signal at inference time as of mid-2026. It is a curation convention that some documentation platforms and crawlers read, not a guaranteed visibility lever.

Does llms.txt actually work, or improve AI citations?

Honest answer: there is little hard public evidence that llms.txt changes how often you get cited in ChatGPT, Perplexity, Claude, or Google AI Overviews as of mid-2026. Google's John Mueller has publicly said Google does not use llms.txt, and OpenAI and Anthropic have not documented inference-time consumption of it. The strongest real-world signal is that documentation platforms like Mintlify auto-generate it and some developer-tool crawlers fetch it. Most claims that llms.txt boosts citations are vendor marketing without a controlled before/after measurement attached. The file costs about 30 minutes to write and carries near-zero downside, so it is a reasonable speculative bet, but treat citation-lift promises with skepticism until you measure your own revenue delta.

Is llms.txt the same as robots.txt or sitemap.xml?

No. robots.txt tells crawlers what they may and may not fetch and is widely respected. sitemap.xml gives search engines a machine-readable list of every URL for indexing. llms.txt is different in intent: it is a curated, human-written, markdown summary of your most LLM-relevant pages plus context, designed to help a model build a useful mental map cheaply. robots.txt restricts, sitemap.xml enumerates everything, llms.txt curates and explains a subset. robots.txt and sitemap.xml are enforced and consumed by named crawlers; llms.txt consumption is informal and uneven.

Should I add llms.txt to my SaaS or ecommerce site in 2026?

For most SaaS and developer-tool sites, yes, because the cost is roughly 30 minutes and the downside is essentially zero. It is most useful when your most valuable pages are not your most-linked pages, which is common for docs, methodology pages, and pricing. For ecommerce with thousands of SKUs, a hand-curated llms.txt is less useful than a clean sitemap and good product schema; list your category and guide pages instead of every product. The one thing I would not do is pay a recurring SaaS fee to auto-generate a 40-line markdown file. Write it once, review it quarterly, and instrument the revenue line so you actually know if it did anything.

How do I measure whether llms.txt improved my revenue?

You cannot measure it in default GA4, because AI-engine clicks land in the Direct/(none) bucket when the referer is stripped. The measurable approach is a before/after revenue test: capture a clean baseline of AI-attributed sessions and revenue for 4-8 weeks using first-party server-side attribution that fingerprints AI-engine referrers and joins sessions to Stripe webhooks, then ship llms.txt and hold everything else constant, then compare AI-attributed revenue per visitor over the next 8-12 weeks. The confound is that AI citation behavior drifts on its own, so the test is directional, not a randomized trial. But measuring your own revenue line beats trusting a vendor's aggregate citation chart.

Which AI engines and tools actually read llms.txt today?

As of mid-2026 the picture is uneven and changes monthly. Documentation platforms (Mintlify, some Docusaurus and GitBook setups) auto-generate llms.txt and llms-full.txt, and several developer-tool and coding-assistant crawlers fetch them. Anthropic publishes an llms.txt for its own docs. Google has publicly stated it does not use llms.txt for Search or Gemini. OpenAI has not documented inference-time use. The safest read is that llms.txt is most reliably consumed inside developer-documentation ecosystems and IDE assistants, and least reliably consumed by the consumer chat surfaces most marketers care about. Verify per-engine rather than assuming universal support.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a curated map: an H1, a summary blockquote, and sectioned lists of links with one-line descriptions, typically 1-5 KB. llms-full.txt is the same idea taken further: it inlines the entire markdown content of those pages into one large file so an agent can ingest your whole relevant corpus in a single fetch with no crawling, often 50 KB to over 1 MB. The full file is most valuable for developer tools and docs, where a coding agent ingesting your complete documentation in-context produces a direct, observable improvement in its answers about your product. For a general marketing site, llms-full.txt is usually overkill.

Does Google use llms.txt for AI Overviews or Gemini?

No, per public statements from Google's John Mueller, Google does not use llms.txt. Google AI Overviews and Gemini draw on Google's existing index, sitemaps, schema, and trust signals rather than a separate llms.txt. This matters because AI Overviews are the highest-volume AI surface most businesses encounter, so the absence of Google consumption substantially weakens any general "llms.txt boosts AI visibility" claim aimed at marketers. If your AI-citation goals center on Google surfaces, your effort is better spent on schema, content quality, and the signals Google already consumes.

Will adding llms.txt hurt my SEO or get me penalized?

No. llms.txt is a separate file that does not interact with your normal SEO. It does not change your robots.txt rules, your sitemap, your canonical tags, or your rendered HTML. Search engines that do not consume it simply ignore it, the same as they ignore humans.txt. There is no documented penalty mechanism. The only real cost is the 30 minutes to author it and the discipline to keep it accurate; a stale llms.txt that points at dead or moved pages is mildly counterproductive for any agent that does read it, which is why a quarterly review is worth scheduling.

How do I know if anything is actually reading my llms.txt?

Check your server access logs, not GA4. GA4 is a client-side tag that bots do not execute, so it never sees a crawler fetch. In your raw logs, grep for requests to /llms.txt and /llms-full.txt, then look at the user-agents: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, and various IDE-assistant agents. A fetch tells you the file was retrieved; it does not tell you the content was used in an answer or that it produced a citation or a click. Consumption, citation, click, and revenue are four separate things measured four separate ways, so do not read a log hit as proof of revenue lift.

Should I pay for an llms.txt generator or monitoring tool?

For generation, almost never: the file is roughly 30 lines of markdown you can write in a text editor, or auto-generate for free with a static-site plugin or your docs host (Mintlify already does it). For monitoring whether the file exists or changed, also no, that is trivial to check yourself. Where paying makes sense is a different job entirely: citation-monitoring tools that track whether your brand appears in AI answers, and first-party revenue-attribution tools that measure whether AI engines actually drove paid conversions. Buy for the job to be done. Do not pay a recurring fee for a file you could have written once over a coffee.

Is llms.txt going to become a real standard like robots.txt?

Possibly, but it is not there yet. As of mid-2026 it is an informal convention hosted at llmstxt.org, not an IETF RFC, and the relevant standards work (the IETF effort around AI crawler preferences) is more focused on access and usage signals than on curated content maps. For llms.txt to become a robots.txt-class standard, the major AI engines would need to publicly commit to consuming it, which most have not done and Google has declined. It could happen if adoption and pressure grow. Until a major consumer-chat engine documents that it uses the file at inference time, treat llms.txt as a useful convention with an uncertain future, not a settled standard.

What is the single best use of my time if I am skeptical of llms.txt?

Instrument your AI-attributed revenue first, then ship llms.txt as a cheap bet and watch what the data does. The reason to invert the usual advice (which says "ship the file, then maybe measure") is that measurement is the only thing that converts the entire GEO debate from opinion into evidence on your own site. Once you can see AI-attributed revenue per visitor by engine, every tactic, llms.txt included, becomes a testable hypothesis instead of a vendor talking point. That is a far better use of an afternoon than reading another blog post (this one included) arguing about whether a 30-line markdown file is magic.

References

llms.txt specification, llmstxt.org. https://llmstxt.org/
Jeremy Howard, The /llms.txt file proposal, Answer.AI. https://www.answer.ai/posts/2024-09-03-llmstxt.html
AnswerDotAI, llms-txt repository and ecosystem, GitHub. https://github.com/AnswerDotAI/llms-txt
Anthropic, Claude documentation (publishes an llms.txt for docs). https://docs.anthropic.com/
OpenAI, Overview of OpenAI's bots and how to control them. https://platform.openai.com/docs/bots
OpenAI, Introducing ChatGPT search. https://openai.com/index/introducing-chatgpt-search/
Vercel, Guidance and patterns for llms.txt. https://vercel.com/blog
Mintlify, llms.txt and llms-full.txt for hosted documentation. https://mintlify.com/docs/settings/llms-txt
AnswerDotAI, Directory and examples of sites using llms.txt, GitHub. https://github.com/AnswerDotAI/llms-txt
Search Engine Land, Google does not use llms.txt (John Mueller). https://searchengineland.com/library/google/google-search
John Mueller / Google Search Central, Search Central guidance and statements. https://developers.google.com/search/docs
Sitemaps.org, Sitemaps XML protocol. https://www.sitemaps.org/protocol.html
IETF, AI Preferences (aipref) Working Group. https://datatracker.ietf.org/wg/aipref/about/
Google Developers, Google-Extended and Google crawlers overview. https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
Anthropic, Does Anthropic crawl data from the web, and how can site owners block the crawler? https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
Schema.org, Article specification. https://schema.org/Article
Cloudflare, AI crawler and bot traffic insights, Cloudflare Radar. https://radar.cloudflare.com/ai-insights
Profound, AI search and citation monitoring research. https://www.tryprofound.com/
Backlinko, Generative engine optimization and AI search research. https://backlinko.com/
MDN Web Docs, Referer header reference. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer

For the broader GEO strategy this tactic fits inside, the GEO tactics playbook for 2026 and how to get cited by AI engines are the companions. For the strategic SEO-vs-AEO framing, see AEO vs SEO in 2026. For the measurement layer that lets you actually test whether llms.txt or any GEO move drove revenue, Attrifast's revenue attribution joins AI-engine sessions to Stripe server-side, with the surface-specific guides for tracking ChatGPT traffic, tracking AI Overviews, logging AI crawlers, and understanding where Google AI gets its information.

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

7-day free trial · $15/mo · cancel anytime