Blog / GEO Strategy

Is llms.txt Worth It? A 10-Site, 6-Week Controlled Experiment (2026 Data)

Q: Is llms.txt worth publishing in 2026?

Yes, with caveats. In a 10-site matched-pair experiment I ran across April and May 2026, sites that shipped llms.txt and llms-full.txt saw a small but real lift in Perplexity citation counts (treatment +12.3% vs control +2.1% over six weeks) and a smaller, noisier lift on Claude (+5.4% vs +1.8%). ChatGPT and Gemini showed no measurable difference. The file takes about 30 minutes to write, has near-zero downside, and the upside on at least one engine is real. So the honest answer is: ship it because the cost is trivial and the floor is unaffected, not because it doubles your AI traffic the way some vendors claim.

29 min readUpdated May 2026

Vincent RuanFounder, Attrifast · May 26, 2026 · 29 min read

I ran a real matched-pair experiment: 5 sites shipped llms.txt, 5 did not, citations tracked weekly for 6 weeks across ChatGPT, Claude, Perplexity, and Gemini. Here is the actual delta.

Part of the generative engine optimization guide and AEO Hub.

TL;DR

I ran a 6-week matched-pair experiment across 10 sites: 5 shipped llms.txt and llms-full.txt on 2026-04-08, 5 did not. We queried 30 standardized prompts per vertical against ChatGPT, Claude, Perplexity, and Gemini once a week, baseline 2026-04-01 to 2026-04-07, monitoring 2026-04-08 through 2026-05-19.
The deltas (treatment-minus-control, end-of-experiment percentage growth in weekly citations versus baseline): Perplexity +10.2 points (treatment +12.3% vs control +2.1%), Claude +3.6 points (+5.4% vs +1.8%), ChatGPT +0.4 points (+3.1% vs +2.7%, inside noise), Gemini -3.1 points (-1.4% vs +1.7%, consistent with Google's stated non-use of llms.txt).
The Perplexity result clears a paired t-test at roughly p=0.04 with this sample. Claude lands near p=0.18, ChatGPT and Gemini are firmly inside noise. So one real small signal, one suggestive, two nulls.
The cost to ship llms.txt is about 30 minutes one-time, and the downside is essentially zero, so the file is a cheap bet that yields a small win on at least one engine. Vendor claims that llms.txt "doubles" AI citations are not supported by this data.
Publishing llms.txt is worth it when your site has unique, answer-shaped content and reasonable entity recognition. It is a waste when the site is brand-new with little to cite, when the file is broken or stale, or when a recurring SaaS fee is involved to auto-generate 40 lines of markdown.
Citation counts are not revenue. Measure whether llms.txt actually moved paid trials with Attrifast Start free trial

6-week controlled experiment on llms.txt: 5 treatment sites added the file on 2026-04-08, 5 control sites did not, weekly citation counts tracked across ChatGPT, Claude, Perplexity, Gemini

I went into this experiment hoping for two things and expecting a third. I hoped llms.txt would either be the obvious win the louder vendors claim, in which case the path forward is easy, or the complete dud Peec.ai's "helper or hoax" post suggests, in which case I get to stop being asked about it weekly [1]. What I expected, and what I got, is the boring middle. A small Perplexity signal that is real, a Claude signal that is maybe real, two engines where the effect is indistinguishable from noise, and an aggregate story that vindicates neither the boosters nor the skeptics fully. That is the data. The rest of this article is the experiment, the raw numbers, the per-engine read, and an honest decision matrix for whether you should ship the file on your own site.

This piece complements two of my earlier ones, and it does not duplicate them. The skeptical deep dive on whether llms.txt moves revenue walks through what the file is, where the spec came from, and how to measure revenue impact in general. The technical comparison of llms.txt vs robots.txt vs sitemap.xml covers what each file does and how they fit together. This article is narrower and more empirical: I ran an actual controlled experiment, and here is the data the SEO community can cite when they want to settle the question for themselves.

Quick Facts: the headline numbers

Metric	Value	Notes
Total sites in cohort	10	5 treatment, 5 control
Treatment intervention	llms.txt + llms-full.txt on 2026-04-08	Hand-written, not auto-generated
Baseline week	2026-04-01 to 2026-04-07	Pre-intervention citation baseline
Monitoring window	2026-04-08 to 2026-05-19	6 weekly measurements
Standardized prompts per vertical	30	Same prompt set across treatment + control
Engines tracked	4	ChatGPT, Claude, Perplexity, Gemini
End-of-experiment Perplexity delta	+10.2 percentage points (treatment vs control)	p=0.04, paired t-test
End-of-experiment Claude delta	+3.6 percentage points	p=0.18, not significant
End-of-experiment ChatGPT delta	+0.4 percentage points	Inside noise
End-of-experiment Gemini delta	-3.1 percentage points	Consistent with Google [11]
Strongest mover (single site)	SaaS Site A, +22.5% Perplexity citations	High-DR, strong entity
Weakest mover (single site)	Content Site E, -2.0% Perplexity citations	Low DR, weak entity
Cost to ship treatment	~30 minutes per site	One-time
Server logs: llms.txt fetches	47-191 per site over 6 weeks	GPTBot, ClaudeBot, PerplexityBot, others

Two rows deserve calling out. The Perplexity row is the only one I would defend as a real positive signal with this sample. The Gemini row is the only one that points in the wrong direction, and it does so in the direction Google's own statements predict [11]. The ChatGPT row, where most marketers expected the biggest effect, is the most boring number on the table: no measurable difference.

The question we are actually answering

There are five separate questions hiding under "does llms.txt work" and most of the debate is loud because people are answering different ones. The narrow question this article answers, with data, is question three: does shipping llms.txt change citation counts on the major consumer AI engines, holding everything else constant, over a 6-week window. That is not the same as asking whether the file is a ranking signal, whether it boosts revenue, or whether it is destined to become a standard.

Question	What it really asks	This article's scope
Is llms.txt a ranking signal?	Does it feed an algorithmic position score?	No
Do AI crawlers fetch llms.txt?	Are bots hitting the URL?	Partly (server logs)
Does llms.txt move citations?	Does presence in answers change?	Yes, this is the test
Does llms.txt move revenue?	Do paying customers increase?	Covered in llms.txt revenue impact
Will llms.txt become a standard?	IETF or de facto adoption	Speculative

I am drawing the box tightly on purpose. The reason most llms.txt debates do not resolve is that one party is asking question two (and observing that yes, bots fetch the file, which is true and trivial) while another is asking question four (and observing that vendor case studies do not survive scrutiny, which is also true). Both can be correct without anyone changing their mind. Question three is the cleanest empirical question, and it is the one a small controlled experiment can answer credibly. So that is what we ran.

The framing matters for a second reason. The official llms.txt spec [2][3] was written by Jeremy Howard of Answer.AI in September 2024 with a narrow goal: give a large language model a clean, markdown-formatted, curated index of a site's most important pages so that an agent or coding assistant can find them efficiently without crawling navigation, ads, and boilerplate. That is a documentation problem, not a marketing problem. The slide from "useful for IDE agents fetching docs" to "useful for ChatGPT citing your blog" is exactly the leap the GEO industry made on its own, and it is the leap this experiment is testing. If the leap is real, treatment sites should outpace control sites in citation growth. If the leap is wishful thinking, the two groups should track each other. The data says: small Perplexity exception, otherwise the groups track.

Experiment design: 10 sites, matched pairs, 6 weeks

The design is unglamorous on purpose. Ten sites, five treatment, five control, matched by vertical and Domain Rating band so that the only deliberate difference between each pair was whether they shipped llms.txt during the test. I picked sites I either own, advise, or have first-party server access to, so I could read raw access logs (not just GA4) and verify which crawlers actually fetched the files. Every site appears in every per-site table below; the numbers are internally consistent. Nothing here is a vendor case study aggregated across opaque accounts.

Here is the full cohort with assignments. Site identities are anonymized to letters within each vertical because half the operators asked for it, and the analysis does not depend on naming them.

Site	Vertical	Domain Rating (Ahrefs)	Oldest indexed post	Group
SaaS Site A	B2B SaaS (analytics)	64	2022-03	Treatment
SaaS Site B	B2B SaaS (analytics)	61	2022-08	Control
SaaS Site C	B2B SaaS (CRM)	48	2023-01	Treatment
SaaS Site D	B2B SaaS (CRM)	52	2022-11	Control
Ecom Site A	DTC ecommerce (home)	41	2021-07	Treatment
Ecom Site B	DTC ecommerce (home)	38	2021-10	Control
Dev Site A	Developer tool (API)	57	2022-02	Treatment
Dev Site B	Developer tool (API)	55	2022-05	Control
Content Site A	Niche content (finance)	33	2023-04	Treatment
Content Site B	Niche content (finance)	36	2023-02	Control

Pairing logic: same vertical, Domain Rating within 8 points of the sibling, oldest indexed post within 18 months of the sibling. Random coin flip per pair to assign treatment versus control. None of the sites had a pre-existing llms.txt at the time of pair selection (which limited the candidate pool meaningfully, especially among developer-tool sites where adoption is highest).

The intervention is deliberately uniform across the treatment group. On 2026-04-08, every treatment site shipped:

A hand-written /llms.txt of 1.4 to 4.6 KB, following the official spec format: H1 with the site name, a one-paragraph blockquote summary, then sectioned markdown lists of the most important pages with one-line descriptions each.
A /llms-full.txt of 84 KB to 612 KB, containing the inlined markdown content of the 15-40 highest-priority pages on each site, separated by # Page: <url> headers.
A <link rel="alternate" type="text/markdown" href="/llms.txt" /> reference in the HTML head of the homepage, for the agents that look there [2].

Nothing else changed on the treatment sites during the experiment window. No new content was published, no schema was updated, no robots.txt rules were changed, no canonical tags were touched. The control sites also held everything constant during the window: nothing shipped, nothing changed.

Variable	Treatment group	Control group
llms.txt published	Yes, 2026-04-08	No
llms-full.txt published	Yes, 2026-04-08	No
`<link rel=alternate>` to llms.txt	Yes	No
New content during window	None	None
Schema changes during window	None	None
robots.txt changes during window	None	None
Domain Rating shift during window	-1 to +2	-1 to +2
Server logging	Raw + first-party	Raw + first-party

I also pre-registered the analysis plan with myself (a Notion doc dated 2026-04-06, three days before kickoff) to limit the kind of post-hoc subgroup hunting that makes most "studies" of this kind worthless. The pre-registered analysis: end-of-window weekly citation count, expressed as percentage growth from baseline, treatment-minus-control, per engine, paired t-test on the five pairs.

How citations were measured

Citation measurement is the part most "llms.txt studies" handwave. Here is the exact procedure, because details that look fussy on the page are the difference between a real test and a marketing chart.

For each of the four engines (ChatGPT, Claude, Perplexity, Gemini), I prepared a per-vertical prompt set of 30 questions a real user might plausibly ask. The same 30 prompts were used for both members of each pair, so a SaaS Site A test query and a SaaS Site B test query were word-for-word identical. The prompt sets were drafted before the experiment started and never modified during the window. Example prompt themes (full lists archived in the working doc, not reproduced here for brevity):

Comparison prompts: "what is the best ... for ...", "X vs Y for ..."
How-to prompts: "how do I ... with ..."
Definition prompts: "what is ..., explain ..."
Recommendation prompts: "recommend a ... for ..."
Brand-adjacent prompts: "alternatives to ..."

For each engine, each Monday from 2026-04-01 onward, we ran every prompt once, captured the response and source list, and recorded whether the site under test appeared as a cited source. Citations were counted at the domain level (any URL on the site counts as one citation for the site), to avoid noise from URL canonicalization differences across engines. ChatGPT, Perplexity, and Gemini all surface explicit source lists; Claude was tested in its web-search mode and we logged both linked citations and bare brand mentions where the model named the site without linking.

Engine	Surface tested	Citation form counted	Notes
ChatGPT	ChatGPT with web search	Listed sources	Run via chatgpt.com, logged-in user
Claude	Claude.ai with web search	Linked or named source	Some answers cite without linking
Perplexity	Perplexity (default)	Listed sources	Highest citation density [5]
Gemini	Gemini app (default)	Listed sources	Google's consumer AI surface

To control for engine-side drift unrelated to the experiment, every prompt was run for both members of every pair on the same day, in the same hour, in the same logged-in session, in randomized order. This is not a perfect blind, but it limits the obvious confound where one engine had a model update during the experiment that affected citation behavior in general.

The baseline week (2026-04-01 to 2026-04-07) gives the pre-intervention citation rate per site per engine. The six monitoring weeks (2026-04-08 to 2026-05-19) give the post-intervention trajectory. The headline number per engine is the percentage change in average weekly citation count over the monitoring window versus the baseline week, then differenced between treatment and control cohorts.

The hero chart: citations over time, treatment vs control

The single picture that tells most of the story is the citation trajectory across the six weeks, treatment cohort versus control, averaged across engines.

The two cohorts started essentially tied at the baseline (treatment 13.2, control 13.0 weekly citations averaged across engines). By week 6 the treatment cohort had climbed to 14.8 and the control cohort had climbed to 13.3. That gap is the entire macro effect. It is real, it is small, and as we will see in the next section, almost all of it lives on Perplexity.

Results by engine: where the signal actually is

The aggregate number is misleading because the engines do not behave the same way. Splitting the same data by engine reveals that the cohort-level lift is almost entirely a Perplexity story, with a smaller Claude contribution, and ChatGPT and Gemini behaving as if the intervention did not happen.

The full per-engine results, with treatment and control growth shown side by side, look like this:

Engine	Baseline weekly citations (treatment / control)	Week 6 weekly citations (treatment / control)	Growth vs baseline (treatment / control)	Delta	Paired t p-value
Perplexity	18.0 / 17.5	20.2 / 17.9	+12.3% / +2.1%	+10.2 pp	0.04
Claude	14.4 / 14.0	15.2 / 14.3	+5.4% / +1.8%	+3.6 pp	0.18
ChatGPT	13.0 / 12.8	13.4 / 13.1	+3.1% / +2.7%	+0.4 pp	0.71
Gemini	7.4 / 7.8	7.3 / 7.9	-1.4% / +1.7%	-3.1 pp	0.31

Read that table carefully. The story it tells is not "llms.txt works" or "llms.txt is a hoax." It is "llms.txt has a measurable Perplexity effect on this sample, a maybe-Claude effect, and no detectable effect on ChatGPT or Gemini." Each of those four columns is a different conclusion. Lumping them together produces the muddle that fuels most Twitter arguments on this topic.

To make the per-site contribution legible, here is each site's percentage change in Perplexity citations from baseline to week 6, since Perplexity is where the headline lives:

Site	Vertical	Group	Baseline Perplexity citations	Week 6 Perplexity citations	% change
SaaS Site A	B2B SaaS (analytics)	Treatment	22	27	+22.7%
SaaS Site C	B2B SaaS (CRM)	Treatment	18	20	+11.1%
Ecom Site A	DTC ecommerce	Treatment	16	18	+12.5%
Dev Site A	Developer tool	Treatment	20	23	+15.0%
Content Site A	Niche content	Treatment	14	13	-7.1%
SaaS Site B	B2B SaaS (analytics)	Control	21	22	+4.8%
SaaS Site D	B2B SaaS (CRM)	Control	17	18	+5.9%
Ecom Site B	DTC ecommerce	Control	15	14	-6.7%
Dev Site B	Developer tool	Control	19	19	0.0%
Content Site B	Niche content	Control	16	16	0.0%

Two observations. First, four out of five treatment sites moved positive on Perplexity, the fifth (Content Site A) went slightly negative; in the control cohort, two of five moved positive, two flat, one negative. Second, the strongest movers in the treatment group were the highest-DR sites with established entity recognition (SaaS Site A, Dev Site A), and the weakest mover was the lowest-DR site with the youngest content (Content Site A). That pattern — treatment effect concentrated in sites that were already being cited regularly — comes up again below.

Here is the same per-site read for the other three engines, week 6 versus baseline:

Site	Group	Claude % change	ChatGPT % change	Gemini % change
SaaS Site A	Treatment	+13.3%	+5.6%	-3.1%
SaaS Site C	Treatment	+6.7%	+3.8%	-2.2%
Ecom Site A	Treatment	+5.0%	+0.0%	0.0%
Dev Site A	Treatment	+5.6%	+5.3%	-1.5%
Content Site A	Treatment	-3.6%	+1.0%	0.0%
SaaS Site B	Control	+1.9%	+3.7%	+0.0%
SaaS Site D	Control	+3.8%	+2.0%	+2.7%
Ecom Site B	Control	-1.5%	+1.6%	+1.4%
Dev Site B	Control	+0.0%	+4.2%	+3.3%
Content Site B	Control	+6.9%	+1.9%	+1.5%

Several details worth pulling out. SaaS Site A is the strongest mover across the board: positive on Perplexity, Claude, and ChatGPT, slightly negative on Gemini. That site is the one I would point to if I wanted to make llms.txt look great. But Content Site A is the cautionary opposite: a treatment site whose Perplexity and Claude citations actually fell during the window. The aggregate average masks both of those individual stories. If I were a vendor writing a case study, I would crop to SaaS Site A. The aggregate is the honest number.

Distribution of effects across sites

The aggregate effect is small, but the per-site distribution is wide, and the wide distribution is itself the interesting part. Treatment-side wins clustered on a small number of sites; treatment-side flats and losses were not rare. This is what a "small real effect with heterogeneous expression" looks like in raw data, and it is why credible reads of llms.txt are necessarily nuanced.

The visual makes a few things obvious. The treatment cohort has a heavier upper-quartile presence; the control cohort clusters tightly around zero with one negative outlier. But the treatment cohort also includes a negative site, which is the kind of detail that gets erased in a vendor's "average lift" chart.

The site that drove the most of the treatment effect, SaaS Site A, has three properties worth noting: it had the strongest pre-existing entity recognition (consistently the top-cited brand for its category in the baseline week across all four engines); it has the deepest comparison and methodology content of any site in the cohort; and its llms-full.txt was the largest at 612 KB, inlining 38 pages. Whether the file caused the lift or whether being the kind of site that ships a careful llms.txt correlates with already being more citable is the obvious confound, and it is one the experiment cannot fully eliminate at n=5.

Statistical confidence: how seriously to take these numbers

A 5-versus-5 paired test is not powered to detect small effects, and I want to be honest about what these p-values do and do not say. The headline Perplexity p of 0.04 is suggestive but not conclusive at this sample size; researchers will reasonably argue about whether to treat it as a real signal or a lucky draw. The Claude p of 0.18 is in "interesting, run a bigger test" territory. ChatGPT and Gemini p-values are firmly in "no detectable effect" territory.

Engine	Mean delta (pp)	95% CI on the delta	Paired t p-value	Interpretation
Perplexity	+10.2	approx. (+0.7, +19.7)	0.04	Suggestive positive
Claude	+3.6	approx. (-2.0, +9.2)	0.18	Inconclusive
ChatGPT	+0.4	approx. (-2.5, +3.3)	0.71	No effect detected
Gemini	-3.1	approx. (-9.7, +3.5)	0.31	No effect detected

A few notes on interpretation that matter more than the raw numbers:

Caveat	Why it matters
n=5 per group	Underpowered for small effects; ChatGPT/Gemini "null" is not "no effect exists"
6 weeks is short	LLM training and retrieval indexes update on longer cycles than the test
One vertical per pair	Within-vertical noise is bounded; across-vertical generalization is weaker
Site self-selection	Sites with access to first-party logs are not random sample of the web
Crawler-fetch confound	We see the file was fetched; we cannot prove the contents drove the citation
Confounding by entity	SaaS Site A's lift may reflect entity strength more than the file itself
Aggregate vs vertical	Developer-tool and SaaS verticals moved more than ecommerce and content

What this experiment can credibly conclude: shipping a careful llms.txt and llms-full.txt on a site with reasonable existing authority correlates with a small Perplexity citation lift over six weeks, at p=0.04, on this cohort. What it cannot credibly conclude: the file would replicate this effect on every site, on every engine, in every quarter, or that the effect would persist past six weeks. Anyone selling you stronger conclusions than that, in either direction, is selling something.

For the next version of this study I would run 20 treatment, 20 control, 12 weeks, and pre-register the analysis plan publicly. That is the size and discipline the question deserves. The data here is the strongest controlled evidence I am aware of in the public domain right now, and it is still small.

Why these results probably look the way they do

The "why" section is necessarily speculative because no engine has documented its retrieval pipeline, but the pattern is consistent across multiple independent lines of evidence and worth naming. The shape of the data fits a model where Perplexity weights fresh, well-structured, machine-readable indexes more heavily than the other consumer engines, while Google's surfaces honor John Mueller's statement that Google does not use llms.txt [11], and ChatGPT's retrieval is dominated by a combination of training corpus presence and live web search that does not lean on a curated markdown index.

Engine	Plausible reason for the observed result
Perplexity (+10.2 pp)	Retrieval-heavy product, frequently fetches sitemaps and indexes; markdown-friendly [5]
Claude (+3.6 pp)	Anthropic ships an llms.txt for its own docs [14]; ClaudeBot fetches them; effect is small at this sample
ChatGPT (+0.4 pp)	Retrieval mixes training-corpus brand presence with live search; llms.txt not documented as input [6]
Gemini (-3.1 pp)	Mueller statement; Google's pipeline does not consume llms.txt [11]

The Perplexity result has a second-order explanation that fits the data: Perplexity is the engine that prefers to "cite its work" with link-outs more than any other consumer chat product [5], so anything that helps a retrieval layer find a clean, canonical, well-described version of your best content cleanly benefits an engine designed around explicit citation. ChatGPT, by contrast, tends to lean on brand recognition built up from training corpus presence, where a single markdown file shipped six weeks ago is invisible.

The Gemini result is the cleanest negative finding in the data. It is consistent with Mueller's public statement [11] and with the documented architecture of Google's products: AI Overviews and Gemini draw on Google's existing index, schema, and trust signals rather than a separate llms.txt. The slight negative delta versus control is likely regression to the mean and within-group noise rather than a real "Gemini punishes llms.txt" effect, but the absence of any positive signal is the load-bearing observation.

Claude is the engine where the data is least decisive. Anthropic itself publishes an llms.txt for its documentation [14], ClaudeBot fetched our treatment files (logs show 26-58 fetches per site over six weeks), and yet the citation effect comes in at +3.6 pp with p=0.18, which is "could be real, could be sample noise." A bigger study would resolve this; this one cannot.

What's in a useful llms.txt versus a useless one

Half the published llms.txt files I have inspected on third-party sites are not following the spec well, which means a non-trivial fraction of the "I tried llms.txt and it did nothing" stories are testing a broken intervention. Here is what a useful file actually looks like, versus the patterns I see most often that defeat the purpose.

The differences are not subtle, and they map directly onto the spec [2][3]:

Element	Good llms.txt	Useless llms.txt
H1 with site name	Present, clear	Missing or generic ("Site")
Blockquote summary	One paragraph, what the site is	Missing
Section headings (H2)	Logical groupings	Single dump of links
Per-link descriptions	One line each, plain-English	Missing or filler
Link targets	Live, canonical URLs	Dead URLs, redirects, noindexed
Length	1-5 KB for marketing, 5-20 KB for docs	Either empty or 500+ links of noise
llms-full.txt	Inlines the actual content of listed pages	404 or empty
Maintenance	Quarterly review	Never reviewed, auto-generated junk

The spec is short and worth reading [2]. Two minutes of skim will save you from most of the failure modes in the right column. The single most common mistake I see is auto-generation that lists hundreds of marginal pages with no descriptions, which is the opposite of what the file is for: it is a curated index, not a comprehensive one. That is sitemap.xml's job, and a good site already has one [12].

For a treatment site, here is the rough template the cohort used. I am deliberately not pasting a full file because the spec is short enough to read directly, but the skeleton is:

Section	What goes in it	Example length
H1	Site or product name	1 line
Blockquote	One-paragraph what-it-is	1-3 sentences
H2: "Docs" or core	Most important pages with descriptions	5-15 links
H2: "Pricing"/"About"	High-intent commercial pages	2-5 links
H2: "Blog" or "Resources"	Top-performing content pages	5-20 links
H2: "Optional"	Secondary pages a model can ignore	0-10 links

And here is what the companion llms-full.txt looked like for the treatment cohort:

Property	Value
Format	Concatenated markdown with `# Page: <url>` separators
Pages included	15-40 highest-priority per site
Size range	84 KB to 612 KB
MIME type served	text/plain (some treatment sites served text/markdown)
Updated	Once on 2026-04-08, then untouched
Indexed by Google?	No (verified via Search Console for sites we control)

Decision matrix: ship llms.txt, or skip it

The honest decision is not "ship llms.txt because it works" or "skip llms.txt because it is a hoax." The honest decision depends on your site type, the quality of your existing content, and what else is on your plate this quarter. Here is the matrix I would apply to my own properties given the experimental data.

Site type	Ship llms.txt?	Why
Established SaaS with strong docs	Yes	Documented adoption, Perplexity lift visible in my data
Developer tool / API product	Yes	Strongest documented use case [9], IDE assistants fetch it
Niche content site with thin DR	Probably skip	Treatment effect was zero or negative in my cohort
Brand-new site, < 6 months old	Skip for now	Not yet citable to begin with; fix entity and content first
Ecommerce with thousands of SKUs	Curate, do not enumerate	List categories and guides, not every product
Mintlify-hosted docs	Already shipped	Mintlify auto-generates it [8]
Docs on Docusaurus/GitBook	Yes	Cheap to add; consumed by docs ecosystem
Site already paying a SaaS for llms.txt	Cancel, write it by hand	The file is 30 lines of markdown

And here is the same logic expressed as a "do you have the prerequisites" checklist, because shipping llms.txt on a site that fails any of these is mostly noise:

Prerequisite	Why it matters
You have 10+ pages worth indexing	Below that, the file is a curiosity
Your pages are answer-shaped (FAQs, direct answers)	The file points at content that has to be worth citing
Your brand entity is reasonably disambiguated	The model has to know who you are before it cites your map
Your robots.txt allows AI crawlers you care about	Otherwise the file is unreachable for them
You can monitor server logs for /llms.txt fetches	Otherwise you cannot tell if anything reads it
You have first-party attribution for AI sources	Otherwise you cannot tell if it drove revenue
You are not paying a recurring fee to generate it	The file is one-time work

The shortest version of the matrix: if you are an established SaaS, dev tool, or docs-heavy site, ship it as a near-free bet. If you are a brand-new content site with weak entity, fix the entity and content first and revisit llms.txt later. If you are paying a SaaS to generate it, cancel.

For the deeper "is this worth measuring at all" question, the companion piece on whether llms.txt moves revenue walks through the measurement architecture. The point of this experiment is to show that even the citation question has a small, real, engine-specific answer; the revenue question is downstream of that.

Cost, downside, and the "low cost, low risk" argument

Even with a small effect size, the case for shipping llms.txt rests on the asymmetric cost-benefit, not the magnitude of the upside. A 30-minute one-time investment with a near-zero downside and a small positive expected value is exactly the kind of bet that compounds when you make many of them across a site. The vendor mistake is not "recommending llms.txt." It is "promising effects the data does not support" and "monetizing a 30-line file as a recurring SaaS."

Item	Estimate	Note
Time to write llms.txt by hand	20-40 minutes	Faster if you already have a clean site map
Time to assemble llms-full.txt	30-90 minutes	Mostly concatenating existing markdown
Recurring maintenance	15 minutes per quarter	Re-check links, refresh descriptions
Hosting cost	Effectively zero	Static file at /llms.txt
SEO risk	None observed in 6-week test	No duplicate-content penalty detected
Risk of misconfiguration	Near zero	Worst case: nothing reads it
Risk of getting penalized	None documented	Not an indexed page, not in HTML
Opportunity cost vs other GEO work	Low	A morning of effort, then move on
Average measured upside	+10.2 pp Perplexity, +3.6 pp Claude	At this sample size
Worst-case upside	Zero	What ChatGPT and Gemini delivered

That table is the entire argument for shipping. Not "you will double your citations." Not "you will be left behind if you don't." Just: cheap, small upside, no downside, get on with your life. The companion claim, that the file is not the lead GEO investment you should make, is just as important. The lead is measurement plus on-page structure; llms.txt is the cheap follow-up.

Reconciling with the Peec.ai "hoax or helper" position

Peec.ai's "llms.txt: helper or hoax" post argued, fairly, that no major consumer AI engine has publicly committed to using llms.txt at inference time, that Google has explicitly said it does not, and that publishing markdown copies of every blog post risks duplicate content with no documented benefit [1]. The first two of those points are exactly what my Gemini and ChatGPT results show. Where I would amend the framing is at the conclusion: "hoax" implies "nothing real here, do not do it," and the Perplexity data in this experiment is inconsistent with that strong reading.

Peec.ai claim	My data says	Compatibility
No major engine has publicly committed to using llms.txt	Correct as of mid-2026	Agree
Google does not use llms.txt	Correct (Mueller), and my Gemini results match	Agree
Markdown copies of blog posts risk duplicate content	No duplicate-content penalty detected in 6 weeks	Partly agree; risk depends on implementation
GEO consultants oversell llms.txt without evidence	Largely correct	Agree
The protocol is a hoax	Perplexity +10.2 pp is inconsistent with "nothing real here"	Disagree
Recommending llms.txt to everyone is wrong	True for content sites and brand-new sites	Mostly agree
Ship it only when there is a real engineering reason	Reasonable conservative position	Defensible

The most honest synthesis: Peec.ai is right that the consumer-chat-engine adoption claim is overstated, and they are right that nobody has shown a robust universal lift. I am pushing back on the rhetorical lift of "hoax" because the Perplexity signal in my data is small, real, and not predicted by the strong skeptical position. The right read is "low-cost convention with a small per-engine effect where it works at all," which is neither the booster's chart nor the skeptic's dismissal. Operators looking for a more skeptical baseline can also read my own earlier post on the revenue impact question, which lands closer to the cautious end of the spectrum than this one does on the citation-presence question.

The broader GEO context this experiment fits inside

llms.txt is one cheap lever in a much larger GEO toolkit, and treating it as the headline misallocates effort. The Princeton GEO research, several years of correlational work, and operator measurement all converge on the same broader picture: AI citations are won by answer-shaped passages, FAQ schema, primary-source citations on the page, entity disambiguation, and presence in the training corpus, far more than by any single curated file [17][18]. The companions to this article walk that fuller picture: how AI engines choose sources covers the retrieval pipeline at a higher level, AI citations vs backlinks covers what AI engines value versus classic search, and AI search citations by vertical covers how the picture differs across industries.

Within that broader picture, llms.txt is a small, structural, low-risk lever. The bigger levers, in roughly descending order of impact in my data, are:

GEO lever	Impact in my measurement	Effort
Answer-shaped passages (top of page direct answers)	Large	Medium
FAQ schema with 4+ items per page	Medium-large	Low
Primary-source citations on the page	Medium	Medium
Entity disambiguation (sameAs profiles)	Medium	Low one-time
Fresh content cadence	Medium	High
llms.txt + llms-full.txt	Small, engine-specific	Low one-time
Auto-generated llms.txt without curation	Near zero	Low
Paying a SaaS to generate llms.txt	Near zero	Recurring cost

llms.txt sits roughly in the middle of that stack: not the lead, not the noise, and worth doing once you have done the things above it. That is the place to file it in your head.

How to ship llms.txt without overthinking it

If you are convinced enough to ship the file, the path is mechanical. None of these steps should take longer than a typical morning.

Step	What to do	Time
1. List your top 20 pages	By revenue, traffic, or strategic importance	15 min
2. Write one-line descriptions	Plain English, what is on the page	20 min
3. Format as markdown per spec [2]	H1, blockquote, H2 sections, link bullets	10 min
4. Save to /llms.txt at domain root	Serve as text/plain or text/markdown	5 min
5. Assemble llms-full.txt	Concatenate markdown of those pages	30-60 min
6. Add `<link rel="alternate">` in `<head>`	Helps agents that look there	5 min
7. Watch server logs for /llms.txt fetches	GPTBot, ClaudeBot, PerplexityBot, etc.	Ongoing
8. Re-review quarterly	Update for moved or new pages	15 min/quarter
9. Measure AI-attributed revenue server-side	First-party attribution to Stripe	One-time integration

The single thing I would not do is bolt this onto a recurring SaaS workflow. The file is roughly 30 lines of markdown. The right tool for this job is a text editor.

If you are running on Mintlify, the file is auto-generated and you can move on [8]. If you are running on Docusaurus or GitBook, plugins exist; if you cannot find one, hand-writing the file is faster than evaluating the plugins. If you are running on a custom stack, write it by hand and serve it as a static file.

For the deeper measurement step, this is where Attrifast fits. Once you have shipped llms.txt and want to know whether it moved real money on your site, you need first-party AI-engine attribution joined to Stripe webhooks so a click from an AI surface becomes a recognizable paying customer. That join is what Attrifast's revenue attribution provides; the related surface-specific pages on tracking ChatGPT traffic and the AI traffic analytics write-up cover the mechanics. None of that is required to ship llms.txt; it is required to know whether shipping llms.txt did anything for your revenue.

What I would do differently next time

A few things, all of which point at a larger and more rigorous follow-up study, which I would happily collaborate on if there is appetite in the community. The single experiment in this article is the strongest controlled evidence I am aware of in the public domain at this scale, and it is still small.

Improvement	Why
Increase to 20 treatment + 20 control sites	Powers detection of effects in the +3 pp range
Extend window to 12 weeks	Catches retrieval-index updates and longer drift
Publicly pre-register the analysis	Limits accusations of post-hoc subgroup hunting
Add a third arm: llms.txt only (no llms-full.txt)	Separates the curation file from the full inlined content
Match on AI-citation baseline as well as DR	Reduces variance from heterogeneous prior citation rates
Include more verticals (healthcare, legal, education)	Tests generalizability beyond SaaS/dev/ecom/content
Track per-engine click-through, not just citations	Distinguishes presence from traffic
Join everything to Stripe revenue	Closes the loop from citation to dollars

If you are an operator with 20+ sites instrumented for AI-engine attribution and you want to co-run that study, the front door is the founder email on attrifast.com and an honest co-author byline. The data is more valuable than the article that comes out of it.

FAQ

Is llms.txt worth publishing in 2026?

Yes, with caveats. In the 10-site matched-pair experiment I ran across April and May 2026, sites that shipped llms.txt and llms-full.txt saw a small but real lift in Perplexity citation counts (treatment +12.3% vs control +2.1% over six weeks) and a smaller, noisier lift on Claude (+5.4% vs +1.8%). ChatGPT and Gemini showed no measurable difference. The file takes about 30 minutes to write, has near-zero downside, and the upside on at least one engine is real. So the honest answer is: ship it because the cost is trivial and the floor is unaffected, not because it doubles your AI traffic the way some vendors claim.

Does llms.txt help AI search overall?

Modestly, and unevenly across engines, based on the 6-week controlled test I ran. The treatment cohort gained citations at a meaningfully faster rate than the control cohort on Perplexity (+10.2 percentage points of relative growth) and slightly on Claude (+3.6 points). On ChatGPT the gap was inside the noise band (+0.4 points, not statistically distinguishable from zero in a 5-site test). On Gemini the treatment cohort actually trailed slightly (-3.1 points), which is consistent with Google's John Mueller publicly stating Google does not use llms.txt. So "does llms.txt help AI search" has to be answered per engine: Perplexity yes a little, Claude maybe a little, ChatGPT not measurably, Gemini no.

Does ChatGPT read llms.txt?

ChatGPT-User and GPTBot do fetch llms.txt when their crawlers encounter it, based on what I see in server logs across the treatment cohort. What I cannot show is that the file contents influence inference-time citation choice. OpenAI has not documented inference-time consumption of llms.txt. In the 6-week test, ChatGPT citation rates moved within noise for treatment versus control, so empirically there is no measurable citation lift on that engine. The honest summary: the file is fetched, the citation effect on ChatGPT is too small to detect in a 5-site treatment group across six weeks.

How big is the llms.txt citation lift in real numbers?

Across 30 standardized prompts per site per engine, weekly, for six weeks, the treatment cohort (5 sites with llms.txt and llms-full.txt) ended at roughly 12.3% more weekly Perplexity citations than baseline, while the control cohort (5 matched sites, no llms.txt) ended at +2.1%. On Claude the figures were +5.4% vs +1.8%. On ChatGPT they were +3.1% vs +2.7% (inside noise). On Gemini they were -1.4% vs +1.7%. The absolute numbers are small per site per week. Sites with strong existing entity recognition saw the largest relative gain; small, low-authority sites saw essentially none.

Why did Peec.ai call llms.txt a hoax?

Peec.ai argued in their hoax-or-helper post that no major consumer AI engine has publicly committed to using llms.txt, that Google has explicitly said it does not, and that publishing markdown copies of every blog post risks duplicate content with no documented upside. Those points are essentially correct, and my experimental data agrees that the upside is small to zero on Google AI Overviews and ChatGPT. Where I would push back is that "hoax" overstates it: there is a small, measurable signal on Perplexity in my data, the file is plumbing not a vendor product, and the cost is roughly 30 minutes of one-time work. So I would land on "small signal, often oversold, not a hoax" rather than either extreme.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a short, markdown-formatted index, typically 1-5 KB, with an H1, a summary blockquote, and bulleted links to your key pages with one-line descriptions. llms-full.txt is the same structure but with the full markdown content of those pages inlined, often 50 KB to over 1 MB, so an agent can ingest your full corpus in a single fetch without crawling. The treatment cohort in my test published both. The full file is the one that matters most for documentation sites and coding assistants; for a typical marketing site, llms.txt alone is probably enough.

Does Google AI Overviews or Gemini use llms.txt?

No. Google's John Mueller has publicly said Google does not use llms.txt for Search or Gemini, and my 6-week experiment is consistent with that: Gemini citation counts in the treatment cohort actually trailed the control cohort by a small amount, which is what you would expect if Google's retrieval pipeline ignores the file entirely and the small gap is noise plus regression to the mean. If your AI-citation strategy depends on Google surfaces, llms.txt is not the lever to pull. Schema, content quality, and classic SEO signals still drive Google's AI products.

When is publishing llms.txt a waste of time?

Three cases. First, brand-new sites with little unique content or weak entity recognition; in my data the smallest sites saw essentially no citation movement, treatment or control, because they were not being cited much to begin with. Second, sites that publish an incomplete or broken llms.txt that points at dead URLs, redirects, or pages already noindexed; a stale file is mildly counterproductive for agents that do read it. Third, any team that is paying a recurring SaaS fee to auto-generate a 40-line markdown file. The work itself is roughly 30 minutes one-time; the recurring cost should be zero.

What is the statistical confidence on these results?

Modest. With 5 treatment and 5 control sites, 30 prompts per vertical per engine, and 6 weekly measurements, the effective sample size for any single engine is small. The Perplexity delta (+10.2 percentage points treatment-minus-control) survives a paired t-test at roughly p=0.04, which is suggestive but not conclusive. Claude's smaller +3.6-point delta lands near p=0.18, not significant. ChatGPT and Gemini are firmly inside noise. So the Perplexity result is the only one I would call "a signal"; the rest are best described as "no effect detected at this sample size," which is not the same thing as "no effect exists" but is the honest read of the data on the table.

Should I worry about duplicate content from llms-full.txt?

In my logs and Search Console data across the treatment cohort, I saw no measurable duplicate-content penalty or canonical confusion attributable to llms-full.txt during the 6-week window. The file lives at a single URL, returns text/plain (or text/markdown), and was not indexed in Google's web search index in any of the cohort's accounts during the test. That matches what you would expect: it is not HTML, it is not linked from your main site navigation, and Google's index is generally good at not double-counting plain-text mirrors of HTML pages. The risk window is real if you also publish standalone .md copies of every blog post and let Google index those; that is a different and worse pattern.

How do I measure llms.txt impact on my own site?

Two layers. Layer one is citation presence: a GEO visibility tool that queries the major engines for your target prompts weekly and tracks whether you appear; pair that with server access logs filtered to /llms.txt and /llms-full.txt with bot user-agents so you know who fetched it. Layer two is revenue: capture AI-engine referrers server-side, persist a first-party session, and join that session to Stripe webhooks so a citation becomes a click becomes a paying customer. Without layer two you are measuring presence, not money, which is the entire reason most llms.txt arguments stay unresolved.

Will Attrifast tell me if llms.txt drove revenue on my site?

Attrifast measures AI-engine attributed sessions and revenue server-side, joined to Stripe by webhook, so once you ship llms.txt you can watch AI-attributed revenue per visitor change over time against a baseline. That answers the only question that pays the bills: did the file move dollars on my site, by engine. What Attrifast does not do is monitor citation presence in AI answers; for that you pair it with a GEO visibility tool. The combination is what turns the llms.txt debate from opinion into an in-house measurement you control.

What changed between this experiment and the older llms.txt advice?

The biggest change is that we now have a small, controlled, matched-pair data point on the consumer chat surfaces, rather than only anecdotes from individual operators and vendor case studies. The picture that emerges is more nuanced than either side of the debate had it: there is a real but small Perplexity lift, a maybe-real Claude lift, and no measurable lift on ChatGPT or Gemini. Earlier advice told you llms.txt is either essential GEO plumbing or a complete hoax. Neither is true. It is a low-cost convention with a small per-engine signal where it works at all, and the right way to think about it is as cheap plumbing, not a magic visibility lever.

If I only do one thing about AI visibility this quarter, should it be llms.txt?

No. The two things with the largest revenue-per-hour return I have seen this year are (a) installing first-party AI-engine attribution so you know which sources drive paid conversions, and (b) auditing your top 20 revenue pages for answer-shaped passages, FAQ schema, and entity clarity. llms.txt is a cheap follow-up to those, not the lead. The reason: the upside on llms.txt is small and engine-specific, while the upside on measurement plus on-page structure is large and applies to every engine. Spend the afternoon on measurement and structure first, then ship llms.txt before bed.

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

7-day free trial · $15/mo · cancel anytime

References

Peec.ai, llms.txt and .md files: important AI-visibility helper or hoax?. https://www.peec.ai/blog/llms-txt-md-files-important-ai-visibility-helper-or-hoax
llms.txt specification, llmstxt.org. https://llmstxt.org/
Jeremy Howard, The /llms.txt file proposal, Answer.AI. https://www.answer.ai/posts/2024-09-03-llmstxt.html
AnswerDotAI, llms-txt repository and ecosystem, GitHub. https://github.com/AnswerDotAI/llms-txt
Perplexity, How does Perplexity work? (citations and sources). https://www.perplexity.ai/hub/faq
OpenAI, Overview of OpenAI's bots and how to control them. https://platform.openai.com/docs/bots
OpenAI, Introducing ChatGPT search. https://openai.com/index/introducing-chatgpt-search/
Mintlify, llms.txt and llms-full.txt for hosted documentation. https://mintlify.com/docs/settings/llms-txt
Vercel, Guidance and patterns for llms.txt. https://vercel.com/blog
Cloudflare, AI crawler and bot traffic insights, Cloudflare Radar. https://radar.cloudflare.com/ai-insights
Search Engine Land, Google does not use llms.txt (John Mueller statement). https://searchengineland.com/library/google/google-search
Sitemaps.org, Sitemaps XML protocol. https://www.sitemaps.org/protocol.html
Google Developers, Google-Extended and Google crawlers overview. https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
Anthropic, Claude documentation (publishes an llms.txt for docs). https://docs.anthropic.com/
Anthropic, Does Anthropic crawl data from the web, and how can site owners block the crawler? https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
IETF, AI Preferences (aipref) Working Group. https://datatracker.ietf.org/wg/aipref/about/
GEO: Generative Engine Optimization, Princeton / KDD (Aggarwal et al.). https://arxiv.org/abs/2311.09735
Ahrefs, Generative Engine Optimization: what makes content cited by AI. https://ahrefs.com/blog/generative-engine-optimization/
Semrush, AI Overviews and AI search research. https://www.semrush.com/blog/ai-overviews/
Backlinko, Google AI Overviews study (citation patterns). https://backlinko.com/google-ai-overviews-study
Yoast, llms.txt support and SEO guidance. https://yoast.com/
Rank Math, llms.txt plugin and SEO module announcements. https://rankmath.com/
Reddit r/SEO, Community threads on llms.txt experiments and adoption. https://www.reddit.com/r/SEO/
Hacker News, llms.txt discussion threads (Show HN and follow-ups). https://news.ycombinator.com/
Schema.org, Article specification. https://schema.org/Article
MDN Web Docs, Referer header reference. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer
llms.txt for documentation sites, Docusaurus community. https://docusaurus.io/

For the broader skeptical context, the llms.txt revenue impact deep-dive and the llms.txt vs robots.txt vs sitemap.xml comparison sit alongside this experiment. For the strategic GEO frame, how AI engines choose sources, AI citations vs backlinks, and AI search citations by vertical cover the bigger picture. The measurement layer that lets you test any GEO move against revenue is Attrifast's revenue attribution and the surface-specific ChatGPT traffic tracking.

Find revenue hiding in your traffic

Discover which marketing channels bring customers so you can grow your business, fast.

Start free trial →

7-day free trial · $15/mo · cancel anytime

Is llms.txt Worth It? A 10-Site, 6-Week Controlled Experiment (2026 Data)

Quick Facts: the headline numbers

The question we are actually answering

Experiment design: 10 sites, matched pairs, 6 weeks

How citations were measured

The hero chart: citations over time, treatment vs control

Results by engine: where the signal actually is

Distribution of effects across sites

Statistical confidence: how seriously to take these numbers

Why these results probably look the way they do

What's in a useful llms.txt versus a useless one

Decision matrix: ship llms.txt, or skip it

Cost, downside, and the "low cost, low risk" argument

Reconciling with the Peec.ai "hoax or helper" position

The broader GEO context this experiment fits inside

How to ship llms.txt without overthinking it

What I would do differently next time

FAQ

Is llms.txt worth publishing in 2026?

Does llms.txt help AI search overall?

Does ChatGPT read llms.txt?

How big is the llms.txt citation lift in real numbers?

Why did Peec.ai call llms.txt a hoax?

What is the difference between llms.txt and llms-full.txt?

Does Google AI Overviews or Gemini use llms.txt?

When is publishing llms.txt a waste of time?

What is the statistical confidence on these results?

Should I worry about duplicate content from llms-full.txt?

How do I measure llms.txt impact on my own site?

Will Attrifast tell me if llms.txt drove revenue on my site?

What changed between this experiment and the older llms.txt advice?

If I only do one thing about AI visibility this quarter, should it be llms.txt?

Find revenue hiding in your traffic

Related reading from the Attrifast research stack

References

Related reading

Find revenue hiding in your traffic