What is llms.txt? How to Add It to Your Website (2026 Guide)
Author
Muhammad Awais
Published
June 10, 2026
Reading Time
14 min read
Views
14k

The robots.txt of AI Search - And Why Your Site Needs One Right Now
A few months ago I was checking my site's traffic in Search Console and noticed something strange impressions from Perplexity and ChatGPT referrals were climbing, but my brand wasn't showing up accurately in AI-generated answers. It kept summarising my tools incorrectly, missing entire sections. That's when I went down the rabbit hole of llms.txt.
If you've never heard of it, here's the one-sentence version: llms.txt is a plain Markdown file you place at your website root that tells AI models exactly which pages matter and what your site is about the same way robots.txt tells search engine crawlers what to index. Think of it as a cheat-sheet for ChatGPT, Perplexity, Claude, and Google's AI Overviews.
In this guide I'll explain what it is, why 2026 is the year it actually matters, and how to generate and deploy one in under five minutes whether you're on Next.js, WordPress, or plain HTML hosting.
What llms.txt is and how it differs from robots.txt and sitemap.xml
Why AI search crawlers miss your best content without it
How to generate a spec-compliant file using a free browser tool
Step-by-step deployment for Next.js, WordPress, and static hosts
How to also control AI training bots vs. answer bots separately
Common mistakes that make your llms.txt useless
What Exactly Is llms.txt?
The llms.txt standard was proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI. The core idea is simple but powerful: modern AI assistants don't crawl your entire website in real time. They have limited context windows, struggle with JavaScript-heavy HTML, and can't easily separate your navigation, ads, and footers from your actual content.
So Howard proposed a standard file /llms.txt that any website can host at its root. It's a lightweight Markdown document containing your site's name, a one-sentence description, and a curated list of your most important pages with brief descriptions. That's it. No complex configuration. No server changes. Just a text file.
Early adopters already include Anthropic (Claude's own website), Stripe, Cloudflare, Cursor, and Vercel. If the tools developers use every day have implemented this, it's a strong signal of where things are heading.
llms.txt vs robots.txt vs sitemap.xml - What's the Difference?
This is the question I get asked most often, so let me lay it out clearly. All three files live at your site root, but they serve completely different audiences and purposes:
robots.txt: Controls crawler access. It tells bots (search engines AND AI crawlers) which URLs they are or aren't allowed to fetch. It's about permission.
sitemap.xml: Lists every URL on your site for search engine indexing. It's about discovery. It tells Google "these pages exist," but nothing about what they contain.
llms.txt: Curated context for AI models. It doesn't list every page it highlights the 5–15 most important ones with descriptions. It's about comprehension. You're telling an AI model "here's what my site is actually about, and here's where the good stuff is."
You need all three. They don't replace each other they work as a stack. robots.txt handles access control, sitemap.xml handles search indexing, and llms.txt handles AI understanding. Missing any one of them leaves a gap.
One important clarification: llms.txt is an inclusion file, not a restriction file. If you want to block specific AI crawlers from accessing your content entirely, that's still done in robots.txt and we'll cover exactly how to set that up for 2026's crop of AI bots in a later section.
Why 2026 Is the Year llms.txt Actually Matters
When the standard launched in late 2024, most SEOs dismissed it as speculative. "Google hasn't confirmed they use it." "No AI provider has verified they read it." Both of those things were true and they're still true today. So why do I think you should implement it anyway?
Because the underlying problem it solves has gotten dramatically worse. Google's AI Overviews now reach 2.5 billion monthly users. AI Mode crossed 1 billion. Perplexity has over 100 million monthly active users. ChatGPT handles more than 2 billion queries per day. People are finding information through AI summaries before they ever click a link and if your site isn't easily parseable by those AI systems, you're invisible to a growing chunk of how the internet works now.
There's also a "B2A" angle that most people miss. B2A stands for Business-to-Agent the emerging pattern where autonomous AI agents browse the web, research products, and take actions on behalf of users. These agents need structured, machine-readable surfaces to work efficiently. llms.txt is the first standardised way to give them exactly that. Even if Google never officially endorses the file, having it means you're already readable by the next generation of AI tooling.
The cost of implementing it? About 10 minutes. The cost of not having it when it becomes table stakes? Your competitors get cited in AI answers and you don't.
What a Proper llms.txt File Looks Like
The spec is strict about format, and that strictness is actually helpful it forces you to be intentional about what you include. Here's a real example following the official specification:
# WebToolsHub
> Free developer tools and technical guides for modern web development.
Built with Next.js 15, runs entirely in your browser — no data leaves your device.
## Core Tools
- [Robots.txt & LLMs.txt Generator](https://www.webtoolshub.online/tools/robots-txt-llms-txt-generator):
Generate robots.txt and llms.txt files visually with AI bot presets for 2026.
- [JSON to TypeScript Converter](https://www.webtoolshub.online/tools/json-to-ts):
Paste JSON and instantly get accurate TypeScript interfaces.
- [Regex Tester & Debugger](https://www.webtoolshub.online/tools/regex-tester-debugger):
Write, test, and debug regular expressions live in your browser.
## Key Guides
- [What is MCP?](https://www.webtoolshub.online/blog/what-is-mcp-model-context-protocol-guide-2026):
Model Context Protocol explained for developers.
- [AEO vs SEO](https://www.webtoolshub.online/blog/aeo-vs-seo-answer-engine-optimization-guide-2026):
How to optimise for AI answer engines in 2026.
## Data Usage Policy
Training AI models on this content requires written permission.
Real-time AI search retrieval and citation is permitted.A few things to notice: the H1 heading is your site name, the blockquote is a single-sentence elevator pitch (this is required by the spec), sections use H2 headings, and each link has a colon followed by a one-line description. Keep the whole file under 5KB AI models prioritise density, not volume.
How to Generate Your llms.txt File (Free, No Signup)
Writing the file manually is fine for small sites, but if you have many tools or blog posts, doing it by hand gets tedious fast. I built a free browser-based generator specifically for this Robots.txt & LLMs.txt Generator on WebToolsHub generates both files simultaneously, with zero server calls. Your data never leaves your browser.
Here's exactly how it works:
Enter your site name and description. This populates the H1 heading and the required blockquote in your llms.txt.
Add your important pages. You can add up to 20 links with custom display names and one-line descriptions. The tool formats them in spec-compliant Markdown automatically.
Write your Usage Policy. This optional section tells AI models whether they can use your content for training, citation, or both. The generator has preset options "allow all," "no training," or custom.
Configure your robots.txt in the same step. This is where the tool really saves time you handle both files at once, including the AI bot presets I'll describe below.
Download both files with one click. You get
robots.txtandllms.txtready to upload to your server root or copy them directly to your clipboard.
The whole process takes about 5 minutes for a typical developer site. For a content-heavy blog it might take 10 as you think through which pages to include.
AI Bot Control - The 2026 Landscape
This is the part that trips most developers up. There are now two distinct categories of AI crawlers, and they need different treatment:
Training bots: These crawl your site to scrape content for model training datasets. Examples:
GPTBot(OpenAI training),Google-Extended(Gemini training),Meta-ExternalAgent(LLaMA training),Applebot-Extended(Apple Intelligence). You may want to block these if you don't want your content used for model training without compensation.Answer bots (real-time retrieval): These crawl your site on demand to answer a user's live query. Examples:
PerplexityBot,OAI-SearchBot(ChatGPT Search),ClaudeBot(Claude real-time search). Blocking these hurts you it prevents your site from being cited in AI-generated answers.
Most websites should allow answer bots and make a deliberate choice about training bots. The WebToolsHub generator has three one-click presets that handle this correctly: Allow All (maximum AI visibility), Block Training (blocks training scrapers, allows answer bots), and Block All AI (full protection, useful for paywalled content). You can also go fully custom and set each bot individually.
Here's what the "Block Training, Allow Answer Bots" configuration looks like in robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: DuckAssistBot
Allow: /
User-agent: *
Allow: /
Sitemap: https://www.yoursite.com/sitemap.xml
Sitemap: https://www.yoursite.com/llms.txtNotice the last line adding your llms.txt as a supplemental sitemap entry in robots.txt is a recommended practice. It helps AI crawlers discover the file faster, since they already read robots.txt by default.
How to Deploy llms.txt - Next.js, WordPress, and Static Sites
Once you've generated the file, deployment depends on your stack. Here are the three most common scenarios:
Next.js App Router (app/ directory)
Next.js has native support for metadata routes. Create the file at app/llms.txt/route.ts and return the content dynamically:
// app/llms.txt/route.ts
import { NextResponse } from "next/server";
export async function GET() {
const content = `# Your Site Name
> Your one-sentence site description.
## Key Pages
- [Page Name](https://yourdomain.com/page): Short description.
## Data Usage Policy
Real-time AI search and citation is permitted.
AI model training requires written permission.
`;
return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=86400",
},
});
}This makes your file available at https://yourdomain.com/llms.txt and gets regenerated on each deploy. For a static version, just drop the generated llms.txt file into your /public folder Next.js serves everything in /public at the root path automatically.
The same applies to robots.txt if you're not already using app/robots.ts for dynamic generation, dropping robots.txt into /public is the quickest path.
WordPress
Upload llms.txt to your WordPress root directory (same level as wp-config.php) using your hosting file manager or FTP. That's it. WordPress doesn't interfere with static files at the root. Verify it's live by visiting https://yoursite.com/llms.txt in your browser.
Alternatively, if you're using Yoast SEO v23.5 or newer, it has a built-in llms.txt generator that auto-populates from your existing SEO metadata and regenerates weekly. Worth enabling if you're already on Yoast.
Static Hosting (Netlify, Vercel, GitHub Pages)
Drop the generated file into your project root (or /public for Vite/React apps, /static for Hugo/Eleventy). All major static hosts serve root-level text files at the bare path with no configuration needed. For Netlify specifically, verify there's no redirect rule accidentally catching the /llms.txt path in your _redirects file.
Common Mistakes That Make Your llms.txt Useless
I've reviewed a lot of llms.txt implementations over the past few months and the same mistakes show up repeatedly. Avoid these:
Listing every single page. The whole point of llms.txt is curation. An AI model that reads 500 URLs with no descriptions learns almost nothing useful about your site. Pick the 5–15 pages that best define what you do. If a page was included on your sitemap, it doesn't need to be in llms.txt too the sitemap already handles discovery.
Skipping descriptions. A bare list of links is marginally better than nothing. The real value comes from the one-line descriptions after the colon. These are what an AI model actually reads to understand your content. "Free tool for developers" is not a useful description. "Generate RFC-compliant cron expressions and see human-readable previews instantly" is.
Using the wrong Markdown format. The spec requires a specific structure H1 site name first, blockquote description second, H2 section headings, then links with descriptions. Missing the blockquote or using incorrect heading levels will cause parsing failures in strict implementations.
Accidentally blocking answer bots in robots.txt. This is the most damaging mistake. If you've added a blanket
User-agent: * Disallow: /on a staging environment that got pushed to production, or if your CDN is serving a cached robots.txt from an old deployment, you're blocking everything including the bots that would cite your content.Setting an aggressive Crawl-delay. Some bots ignore crawl delay entirely, but for those that respect it, setting a delay over 10 seconds can effectively prevent real-time answer bots from fetching your pages before a query times out. Keep it at 1–2 seconds maximum, or remove it entirely for answer bots.
How to Verify Your llms.txt Is Working
Once deployed, verification is straightforward. First, check the file is accessible at the root path open a private browser window and visit https://yourdomain.com/llms.txt. You should see raw Markdown text, not an HTML page.
Second, check your robots.txt at https://yourdomain.com/robots.txt and confirm the AI bot rules are correct. If you're on Next.js, the Google Search Console robots.txt tester is useful for spotting syntax issues.
Third, use the Sitemap Validator to confirm your XML sitemap references are correct because a broken sitemap can indirectly hurt AI crawlability even if your llms.txt is perfect.
For deeper GEO (Generative Engine Optimization) analysis, tools like LLMrefs can show you how frequently your domain appears in AI-generated answers across ChatGPT, Perplexity, and Claude useful for measuring whether your AI SEO efforts are actually moving the needle.
Beyond technical verification, the real test is qualitative: ask ChatGPT or Perplexity about your site's primary topic and see whether your brand or a specific page gets cited. If it does and the description matches your actual content your implementation is working. If the AI still gets your site wrong or ignores it, revisit your blockquote description and section descriptions. That one-sentence summary is the most important line in the entire file.
llms.txt and AEO - The Bigger Picture
llms.txt is one piece of a broader shift called AEO Answer Engine Optimization. The idea is that ranking in Google's traditional blue-link results is no longer enough. You need to be cited in AI-generated answers, not just ranked in the index. Our guide on AEO vs SEO for 2026 covers the full picture, but llms.txt is essentially the technical foundation of any AEO strategy. It's the file that tells AI systems "I'm here, I'm trustworthy, here's what I know."
It also complements the broader trend of GEO - Generative Engine Optimization which our post on whether SEO is dead in 2026 digs into in detail. If you want your developer blog or tools site to stay relevant as AI search becomes the default, this is the foundational technical work to do now before it becomes mainstream.
Quick-Start Checklist
Before we get to the FAQs, here's a condensed action list you can work through in one session:
✅ Generate your
llms.txtandrobots.txtusing the free Robots.txt & LLMs.txt Generator✅ Include 5–15 curated pages with one-line descriptions not a full sitemap dump
✅ Write a sharp one-sentence blockquote description this is what AI models read first
✅ Add a Data Usage Policy section (allow answer bots, decide on training bots)
✅ Configure robots.txt with explicit User-Agent rules for each major AI crawler
✅ Reference your llms.txt as a supplemental sitemap entry in robots.txt
✅ Deploy both files to your site root and verify they're accessible
✅ Test with ChatGPT or Perplexity ask about your site's main topic and see if you appear
Frequently Asked Questions
What is llms.txt and what is it used for?
llms.txt is a plain Markdown file hosted at your website's root (e.g., https://yoursite.com/llms.txt) that tells large language models and AI assistants which pages on your site are most important and what your site is about. Proposed by Jeremy Howard of Answer.AI in September 2024, it works alongside robots.txt and sitemap.xml robots.txt handles crawler permissions, sitemap.xml handles search engine page discovery, and llms.txt handles AI comprehension. It's designed to help AI tools like ChatGPT, Perplexity, and Claude accurately represent your content in generated answers.
Does Google use llms.txt for AI Overviews or ranking?
Google has not officially confirmed that it reads or uses llms.txt files for AI Overviews or traditional search ranking as of June 2026. Google's John Mueller has noted that major crawlers currently prioritise standard HTML over these files. However, the file takes about 10 minutes to implement, costs nothing, and is already adopted by Anthropic, Vercel, Stripe, and Cloudflare. Even if Google's crawler doesn't act on it today, other AI search engines and autonomous agents do and the standard is evolving quickly. The downside of not having it is higher than the effort required to add it.
What is the difference between llms.txt and llms-full.txt?
The standard defines two files. llms.txt is a lightweight curated index a list of your key pages with brief descriptions, usually under 5KB. llms-full.txt is the full Markdown content of every page listed in your index, concatenated into a single file. The full version is useful when you want an AI model to ingest your entire documentation or content base in a single fetch common for developer tool documentation sites. For most blogs and marketing sites, llms.txt alone is sufficient. Start with that, then add the full version later if needed.
How do I block AI training bots but still appear in AI search results?
Training bots (GPTBot, Google-Extended, Meta-ExternalAgent, Applebot-Extended) and answer bots (PerplexityBot, ClaudeBot, OAI-SearchBot) are different crawlers with different User-Agent strings. You can block training bots with a Disallow: / rule under their specific User-Agent in robots.txt, while leaving answer bots fully allowed. This prevents your content from being used in model training datasets while still letting real-time AI search assistants cite your pages. The Robots.txt & LLMs.txt Generator has a one-click "Block Training" preset that configures all of this correctly.
Where do I put llms.txt on my website?
It must be at the root of your domain accessible at https://yourdomain.com/llms.txt. For Next.js, put it in the /public folder or create a route handler at app/llms.txt/route.ts. For WordPress, upload it to the same directory as your wp-config.php file using FTP or your hosting file manager. For static hosts (Netlify, Vercel, GitHub Pages), place it in your project root or your static assets folder. Verify it's accessible by opening the URL directly in a browser you should see raw Markdown text.
Does llms.txt work with Disallow rules in robots.txt?
Yes, but they operate independently. robots.txt controls whether a bot can access a URL at all if a bot is blocked in robots.txt, it cannot fetch that URL regardless of what llms.txt says. llms.txt only helps bots that already have access understand your content better. Make sure the AI bots you want to read your llms.txt aren't accidentally blocked in your robots.txt first. Check your live robots.txt at /robots.txt and confirm that crawlers like PerplexityBot and ClaudeBot have Allow: / rules (or at least no Disallow rules) before you invest time on your llms.txt content.
Is the llms.txt generator on WebToolsHub free to use?
Yes. the Robots.txt & LLMs.txt Generator is completely free, requires no account, and runs entirely in your browser. No data is sent to any server. You can add your site name, description, important pages with custom descriptions, AI bot presets, and a data usage policy then download both files instantly. All tools on WebToolsHub work the same way: free, client-side, no sign-up.
Continue Reading
Explore All ArticlesLevel Up Your Workflow
Free professional tools mentioned in this article
Bcrypt Generator & Verifier
Generate and verify Bcrypt password hashes instantly in your browser. A secure, client-side Bcrypt hash calculator for developers with zero backend logs.
Robots.txt & LLMs.txt Generator
Generate robots.txt and llms.txt files instantly with AI bot presets for GPTBot, ClaudeBot, and PerplexityBot. Control who crawls your site in 2026.
Stripe & PayPal Fee Calculator
Calculate the exact Stripe and PayPal transaction fees for US and UK markets. A free developer tool to estimate SaaS payouts, merchant costs, and revenues.
SQL Query Validator
Free online SQL validator that checks your queries for syntax errors, missing clauses, dialect-specific issues, and bad practices. Supports MySQL, PostgreSQL, SQLite, and SQL Server.



