---
title: "Optimizing Your Website for AI Agents and LLMs"
date: "2026-04-14"
excerpt: "Your website has human visitors and AI visitors. Here's how to serve both — with llms.txt, inline LLM instructions, structured data, and machine-readable feeds."
author: "Agnel Nieves"
tags: ["AI", "SEO", "Web Development", "LLMs"]
status: "published"
lastModified: "2026-04-14"
---

Your website has two audiences now. Humans, obviously. But also AI agents — LLMs that crawl, summarize, cite, and recommend your content to millions of people. If your site isn't optimized for both, you're leaving visibility on the table.

I just finished optimizing [this site](/) for AI consumption, and the process revealed something interesting: most of what makes a site good for AI also makes it better for humans. Clear structure, machine-readable content, and explicit metadata benefit everyone.

Here's what I did and why it matters.

## What Are AI Agents Actually Doing with Your Site?

When someone asks ChatGPT, Claude, Perplexity, or Google's AI Overview a question, those systems don't just generate answers from training data. Increasingly, they fetch and cite live web content. Your site might get:

- **Crawled for training data** by bots like GPTBot, ClaudeBot, and Google-Extended
- **Fetched at query time** by Perplexity, ChatGPT browsing, and similar agents
- **Cited as a source** in AI-generated responses
- **Summarized in featured snippets** and AI overviews
- **Navigated by autonomous agents** that interact with your APIs

Each of these has different needs, but they all benefit from the same foundation: structured, discoverable, machine-readable content.

## The llms.txt Standard

The [llms.txt spec](https://llmstxt.org) is the equivalent of `robots.txt` for AI agents. While `robots.txt` tells crawlers what they *can* access, `llms.txt` tells them what your site *is* — a structured markdown index served at your domain root.

The format is simple:

```markdown
# Your Name or Site

> A one-line summary of what this site is.

A longer description paragraph.

## Section Name

- [Link Title](https://url): Description of what's at this link
```

I implemented two variants:

- **`/llms.txt`** — the index. A table of contents with links to all pages, blog posts, projects, social profiles, and feeds. Think of it as a menu for AI agents to browse selectively.
- **`/llms-full.txt`** — the full dump. Every blog post's complete markdown content, every project description, biographical context. For agents that want to load everything into context at once.

Both are served as `text/plain` with markdown formatting. Both are generated dynamically from the same data sources that power the site, so they never go stale.

## Inline LLM Instructions in HTML

This one comes from a [Vercel proposal](https://vercel.com/blog/a-proposal-for-inline-llm-instructions-in-html) and it's clever: embed AI-readable instructions directly in your page's `<head>` using a script tag browsers ignore.

```html
<script type="text/llms.txt">
# Your Site Name

This is the personal website of [name], a [role] based in [location].

## Site Structure
- / — Home: Description
- /blog — Blog: Description
- /about — About: Description

## Key Facts
- Name: Your Name
- Role: Your Role
- Specialties: Thing 1, Thing 2, Thing 3
</script>
```

Browsers skip `<script>` tags with unknown types. LLMs process them. It's a zero-cost way to give every page on your site a machine-readable context block. I added one to my root layout that describes who I am, the site structure, and where to find machine-readable content.

## Structured Data That AI Engines Actually Use

[JSON-LD](https://json-ld.org/) structured data has always been important for Google. It's now equally important for AI engines. When an LLM encounters schema.org markup, it understands the *semantics* of your content — not just the text, but what the text represents.

I already had structured data for my blog posts (`BlogPosting` schema with breadcrumbs). What I added was `CreativeWork` schema for my [portfolio projects](/work), giving each project a machine-readable identity:

```json
{
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  "name": "Project Name",
  "description": "What this project is",
  "url": "https://project-url.com",
  "creator": {
    "@type": "Person",
    "name": "Your Name"
  }
}
```

The more schema types you cover, the more AI engines can understand and cite your work with proper attribution.

## Machine-Readable Feeds

RSS is great, but it's XML — not the most natural format for AI agents to parse. I added a [JSON Feed](https://www.jsonfeed.org/) endpoint alongside my existing RSS feed:

- **`/feed.xml`** — RSS 2.0 for traditional feed readers
- **`/feed.json`** — JSON Feed 1.1 for programmatic consumption

JSON Feed is cleaner for AI agents to parse and reference. Both are registered in the site's metadata so they're auto-discoverable.

## Making robots.txt AI-Aware

Most sites already have a `robots.txt`. The key addition is explicitly allowing AI crawlers and pointing them to your `llms.txt`:

```
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

# AI/LLM Content
# llms.txt: https://yoursite.com/llms.txt
# llms-full.txt: https://yoursite.com/llms-full.txt
```

Many sites block AI crawlers by default. If you *want* your content cited and discovered by AI, explicitly allow the major bots: `GPTBot`, `ChatGPT-User`, `Google-Extended`, `ClaudeBot`, `anthropic-ai`, `PerplexityBot`, `Applebot-Extended`, `Bytespider`, and `cohere-ai`.

## Why This Matters for Creators

As a design engineer with 15+ years of building products, I've watched SEO evolve from keyword stuffing to semantic web to AI-native discovery. We're at an inflection point. The sites that get cited by AI aren't necessarily the ones with the best domain authority — they're the ones with the clearest, most structured, most machine-readable content.

This is especially important for personal sites and portfolios. When someone asks an AI "who are the best design engineers in Miami?" or "what's a good article about design tokens?", you want your site to be citable. That requires more than good content — it requires content that AI can *find*, *understand*, and *attribute*.

## The Full Stack of AI Optimization

Here's the complete checklist of what I now have in place:

| Layer | What | Why |
|-------|------|-----|
| `robots.txt` | Explicitly allow AI bots | Let them crawl |
| `sitemap.xml` | Dynamic sitemap with all content | Let them discover |
| `llms.txt` | Markdown index of the site | Let them understand structure |
| `llms-full.txt` | Full content in one file | Let them ingest everything |
| Inline `<script>` | Page-level LLM instructions | Let them understand context |
| JSON-LD | Structured data on every page | Let them understand semantics |
| RSS + JSON Feed | Machine-readable content feeds | Let them subscribe |
| Meta tags | OpenGraph, Twitter, canonical | Let them cite accurately |

None of these changes affect how the site looks or feels for human visitors. They're invisible additions that make the site dramatically more useful for AI.

## What's Next

The AI web is evolving fast. Standards like `llms.txt` are still emerging, and new patterns will appear. But the fundamentals won't change: structure your content clearly, make it discoverable, and give machines the metadata they need to understand it.

If you want to replicate this setup, I've published a [full implementation guide](/guides/ai-optimization-guide.md) with code examples for Next.js. The approach works for any framework — the concepts are universal.

---

*Building something and want to talk AI optimization? [Let's connect](/connect).*
