Optimizing Your Website for AI Agents and LLMs

Your website has two audiences now. Humans, obviously. But also AI agents — LLMs that crawl, summarize, cite, and recommend your content to millions of people. If your site isn't optimized for both, you're leaving visibility on the table.

I just finished optimizing this site for AI consumption, and the process revealed something interesting: most of what makes a site good for AI also makes it better for humans. Clear structure, machine-readable content, and explicit metadata benefit everyone.

Here's what I did and why it matters.

What Are AI Agents Actually Doing with Your Site?

When someone asks ChatGPT, Claude, Perplexity, or Google's AI Overview a question, those systems don't just generate answers from training data. Increasingly, they fetch and cite live web content. Your site might get:

Crawled for training data by bots like GPTBot, ClaudeBot, and Google-Extended
Fetched at query time by Perplexity, ChatGPT browsing, and similar agents
Cited as a source in AI-generated responses
Summarized in featured snippets and AI overviews
Navigated by autonomous agents that interact with your APIs

Each of these has different needs, but they all benefit from the same foundation: structured, discoverable, machine-readable content.

The llms.txt Standard

The llms.txt spec is the equivalent of robots.txt for AI agents. While robots.txt tells crawlers what they can access, llms.txt tells them what your site is — a structured markdown index served at your domain root.

The format is simple:

# Your Name or Site
 
> A one-line summary of what this site is.
 
A longer description paragraph.
 
## Section Name
 
- [Link Title](https://url): Description of what's at this link

I implemented two variants:

/llms.txt — the index. A table of contents with links to all pages, blog posts, projects, social profiles, and feeds. Think of it as a menu for AI agents to browse selectively.
/llms-full.txt — the full dump. Every blog post's complete markdown content, every project description, biographical context. For agents that want to load everything into context at once.

Both are served as text/plain with markdown formatting. Both are generated dynamically from the same data sources that power the site, so they never go stale.

Inline LLM Instructions in HTML

This one comes from a Vercel proposal and it's clever: embed AI-readable instructions directly in your page's <head> using a script tag browsers ignore.

<script type="text/llms.txt">
# Your Site Name
 
This is the personal website of [name], a [role] based in [location].
 
## Site Structure
- / — Home: Description
- /blog — Blog: Description
- /about — About: Description
 
## Key Facts
- Name: Your Name
- Role: Your Role
- Specialties: Thing 1, Thing 2, Thing 3
</script>

Browsers skip <script> tags with unknown types. LLMs process them. It's a zero-cost way to give every page on your site a machine-readable context block. I added one to my root layout that describes who I am, the site structure, and where to find machine-readable content.

Structured Data That AI Engines Actually Use

JSON-LD structured data has always been important for Google. It's now equally important for AI engines. When an LLM encounters schema.org markup, it understands the semantics of your content — not just the text, but what the text represents.

I already had structured data for my blog posts (BlogPosting schema with breadcrumbs). What I added was CreativeWork schema for my portfolio projects, giving each project a machine-readable identity:

{
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  "name": "Project Name",
  "description": "What this project is",
  "url": "https://project-url.com",
  "creator": {
    "@type": "Person",
    "name": "Your Name"
  }
}

The more schema types you cover, the more AI engines can understand and cite your work with proper attribution.

Machine-Readable Feeds

RSS is great, but it's XML — not the most natural format for AI agents to parse. I added a JSON Feed endpoint alongside my existing RSS feed:

/feed.xml — RSS 2.0 for traditional feed readers
/feed.json — JSON Feed 1.1 for programmatic consumption

JSON Feed is cleaner for AI agents to parse and reference. Both are registered in the site's metadata so they're auto-discoverable.

Making robots.txt AI-Aware

Most sites already have a robots.txt. The key addition is explicitly allowing AI crawlers and pointing them to your llms.txt:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

# AI/LLM Content
# llms.txt: https://yoursite.com/llms.txt
# llms-full.txt: https://yoursite.com/llms-full.txt

Many sites block AI crawlers by default. If you want your content cited and discovered by AI, explicitly allow the major bots: GPTBot, ChatGPT-User, Google-Extended, ClaudeBot, anthropic-ai, PerplexityBot, Applebot-Extended, Bytespider, and cohere-ai.

Why This Matters for Creators

As a design engineer with 15+ years of building products, I've watched SEO evolve from keyword stuffing to semantic web to AI-native discovery. We're at an inflection point. The sites that get cited by AI aren't necessarily the ones with the best domain authority — they're the ones with the clearest, most structured, most machine-readable content.

This is especially important for personal sites and portfolios. When someone asks an AI "who are the best design engineers in Miami?" or "what's a good article about design tokens?", you want your site to be citable. That requires more than good content — it requires content that AI can find, understand, and attribute.

The Full Stack of AI Optimization

Here's the complete checklist of what I now have in place:

Layer	What	Why
`robots.txt`	Explicitly allow AI bots	Let them crawl
`sitemap.xml`	Dynamic sitemap with all content	Let them discover
`llms.txt`	Markdown index of the site	Let them understand structure
`llms-full.txt`	Full content in one file	Let them ingest everything
Inline `<script>`	Page-level LLM instructions	Let them understand context
JSON-LD	Structured data on every page	Let them understand semantics
RSS + JSON Feed	Machine-readable content feeds	Let them subscribe
Meta tags	OpenGraph, Twitter, canonical	Let them cite accurately

None of these changes affect how the site looks or feels for human visitors. They're invisible additions that make the site dramatically more useful for AI.

What's Next

The AI web is evolving fast. Standards like llms.txt are still emerging, and new patterns will appear. But the fundamentals won't change: structure your content clearly, make it discoverable, and give machines the metadata they need to understand it.

If you want to replicate this setup, I've published a full implementation guide with code examples for Next.js. The approach works for any framework — the concepts are universal.

Building something and want to talk AI optimization? Let's connect.