# AI, SEO, GEO & AEO Optimization Guide

A practical guide to optimizing websites for AI agents, LLMs, search engines, and answer engines. Based on the implementation at [agnelnieves.com](https://agnelnieves.com), built with Next.js 15 — but the concepts apply to any framework.

---

## Table of Contents

- [Overview: SEO vs AEO vs GEO](#overview-seo-vs-aeo-vs-geo)
- [1. llms.txt — Machine-Readable Site Index](#1-llmstxt--machine-readable-site-index)
- [2. llms-full.txt — Full Content Endpoint](#2-llms-fulltxt--full-content-endpoint)
- [3. Inline LLM Instructions in HTML](#3-inline-llm-instructions-in-html)
- [4. robots.txt for AI Crawlers](#4-robotstxt-for-ai-crawlers)
- [5. Structured Data (JSON-LD)](#5-structured-data-json-ld)
- [6. JSON Feed](#6-json-feed)
- [7. Sitemap Optimization](#7-sitemap-optimization)
- [8. Content Writing for AI Engines](#8-content-writing-for-ai-engines)
- [9. Meta Tags & OpenGraph](#9-meta-tags--opengraph)
- [10. RSS Feed](#10-rss-feed)
- [Complete Checklist](#complete-checklist)
- [Verification](#verification)

---

## Overview: SEO vs AEO vs GEO

| Term | Full Name | What It Means |
|------|-----------|---------------|
| **SEO** | Search Engine Optimization | Optimize for Google, Bing, and traditional search engines |
| **AEO** | Answer Engine Optimization | Optimize for AI-powered answer engines (Perplexity, Google AI Overviews, Bing Chat) |
| **GEO** | Generative Engine Optimization | Optimize for LLMs that generate responses citing your content (ChatGPT, Claude, Gemini) |

These aren't competing strategies — they're layers. Good SEO is the foundation. AEO adds structured, citable answers. GEO adds machine-readable discovery and context. This guide covers all three.

### Key Differences

**SEO** focuses on keywords, backlinks, and page authority. Your goal: rank in search results.

**AEO** focuses on structured answers, Q&A formatting, and schema markup. Your goal: appear in featured snippets and AI-generated answers.

**GEO** focuses on machine-readable content, discoverability, and explicit metadata. Your goal: get cited when LLMs generate responses about your domain.

---

## 1. llms.txt — Machine-Readable Site Index

### What It Is

The [llms.txt standard](https://llmstxt.org) provides a conventional URL (`/llms.txt`) where AI agents can discover what your site contains — similar to how `/robots.txt` tells crawlers what they can access.

### Spec Format

```markdown
# Site Name

> One-line description of the site.

Optional longer description paragraph.

## Section Name

- [Link Title](https://full-url): Brief description of what's at this URL
```

Key rules:
- Served as `Content-Type: text/plain; charset=utf-8` (not `text/markdown`)
- Uses markdown formatting that agents parse themselves
- H1 is required (site/project name)
- Blockquote summary is recommended
- H2 sections group links logically

### Two Variants

| File | Purpose | Use Case |
|------|---------|----------|
| `llms.txt` | Index / table of contents | Agents that want to browse selectively |
| `llms-full.txt` | Complete content dump | Agents that want everything in one fetch |

### Next.js Implementation

Create `src/app/llms.txt/route.ts`:

```typescript
import { getAllPosts } from "@/lib/blog/utils";

const SITE_URL = "https://yoursite.com";

export async function GET() {
  const posts = await getAllPosts();
  const publishedPosts = posts.filter((p) => p.status === "published");

  const blogEntries = publishedPosts
    .map((post) => `- [${post.title}](${SITE_URL}/blog/${post.slug}): ${post.excerpt}`)
    .join("\n");

  const content = `# Your Name

> One-line description of your site.

## Pages

- [Home](${SITE_URL}/): Description
- [Blog](${SITE_URL}/blog): Description

## Blog Posts

${blogEntries}

## Feeds

- [RSS Feed](${SITE_URL}/feed.xml)
- [JSON Feed](${SITE_URL}/feed.json)
- [Sitemap](${SITE_URL}/sitemap.xml)
- [Full LLM Content](${SITE_URL}/llms-full.txt)
`;

  return new Response(content, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "public, s-maxage=3600, stale-while-revalidate=600",
    },
  });
}
```

### Other Frameworks

- **Static sites (Hugo, Jekyll, Astro):** Generate `llms.txt` as a build artifact using your content data
- **Express/Fastify:** Add a `GET /llms.txt` route that returns `text/plain`
- **Django/Rails:** Add a view mapped to `/llms.txt`

The key is: generate it dynamically from your content so it never goes stale.

---

## 2. llms-full.txt — Full Content Endpoint

### What It Is

While `llms.txt` is an index, `llms-full.txt` bundles all your content into a single response. Agents that prefer loading everything into context at once use this instead of following individual links.

### Next.js Implementation

Create `src/app/llms-full.txt/route.ts`:

```typescript
import { getAllPosts, getPostBySlug } from "@/lib/blog/utils";

const SITE_URL = "https://yoursite.com";

export async function GET() {
  const posts = await getAllPosts();
  const publishedPosts = posts.filter((p) => p.status === "published");

  // Fetch full content for all posts in parallel
  const fullPosts = await Promise.all(
    publishedPosts.map((post) => getPostBySlug(post.slug))
  );

  const blogContent = fullPosts
    .filter(Boolean)
    .map((post) => `### ${post.title}

- **Date:** ${post.date}
- **Tags:** ${post.tags.join(", ")}
- **URL:** ${SITE_URL}/blog/${post.slug}

${post.content}

---`)
    .join("\n\n");

  const content = `# Your Site - Full Content

> Complete content for LLM consumption.

## About

Your bio and background here.

## Blog Posts

${blogContent}
`;

  return new Response(content, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "public, s-maxage=3600, stale-while-revalidate=600",
    },
  });
}
```

### Considerations

- **Response size:** With a handful of posts, this is fine. If you have 100+ long posts, consider truncating older content or adding pagination hints.
- **Draft filtering:** Always filter to published content only. Never expose drafts.
- **Caching:** Use CDN caching (`s-maxage`) since this is an expensive route to generate.

---

## 3. Inline LLM Instructions in HTML

### What It Is

A [proposal from Vercel](https://vercel.com/blog/a-proposal-for-inline-llm-instructions-in-html) for embedding AI-readable context directly in your HTML `<head>`. Browsers ignore `<script>` tags with unknown `type` attributes, but LLMs process the content.

### Syntax

```html
<script type="text/llms.txt">
# Site Name

This site belongs to [name], a [role] based in [location].

## Site Structure
- / — Home: About and featured work
- /blog — Blog: Articles on [topics]
- /work — Portfolio: [X] projects

## Key Facts
- Name: Your Name
- Role: Your Role
- Specialties: Topic 1, Topic 2
</script>
```

### Next.js Implementation

In your root `layout.tsx`, add inside `<head>`:

```tsx
<script
  type="text/llms.txt"
  dangerouslySetInnerHTML={{
    __html: `# Your Site

This site belongs to Your Name, a Design Engineer based in Miami.

## Site Structure
- / — Home
- /blog — Blog
- /work — Portfolio

## Machine-Readable Content
- /llms.txt — Site index for LLMs
- /llms-full.txt — Complete content
- /feed.xml — RSS feed
- /feed.json — JSON feed`,
  }}
/>
```

### Why It Works

- Browsers skip `<script type="text/llms.txt">` entirely — zero rendering impact
- No performance cost (it's just text in the DOM)
- LLMs that fetch and parse your HTML will see it
- Works without any server-side changes or API calls
- Follows the llms.txt naming convention for consistency

### Per-Page Instructions

You can add page-specific instructions too. For example, on a blog post page:

```html
<script type="text/llms.txt">
This is a blog post about [topic] by [author].
Published on [date]. This post answers the question: [title as question].
For the full machine-readable version, see /llms-full.txt.
</script>
```

---

## 4. robots.txt for AI Crawlers

### Allowing AI Bots

Many sites inadvertently block AI crawlers. If you want your content discovered and cited by AI, explicitly allow these user agents:

| Bot | Organization | Purpose |
|-----|-------------|---------|
| `GPTBot` | OpenAI | Training data crawling |
| `ChatGPT-User` | OpenAI | Real-time browsing for ChatGPT |
| `Google-Extended` | Google | Gemini / AI training |
| `ClaudeBot` | Anthropic | Training data crawling |
| `anthropic-ai` | Anthropic | Claude training |
| `PerplexityBot` | Perplexity | Real-time answer generation |
| `Applebot-Extended` | Apple | Apple Intelligence training |
| `Bytespider` | ByteDance | AI training |
| `cohere-ai` | Cohere | AI training |

### Next.js Implementation (Route Handler)

Using a route handler instead of Next.js Metadata API gives you full control to add llms.txt references:

Create `src/app/robots.txt/route.ts`:

```typescript
const SITE_URL = "https://yoursite.com";

const AI_BOTS = [
  "GPTBot", "Google-Extended", "ChatGPT-User",
  "Applebot-Extended", "anthropic-ai", "ClaudeBot",
  "PerplexityBot", "Bytespider", "cohere-ai",
];

export async function GET() {
  const aiRules = AI_BOTS
    .map((bot) => `User-agent: ${bot}\nAllow: /`)
    .join("\n\n");

  const robotsTxt = `User-agent: *
Allow: /
Disallow: /api/
Disallow: /_next/

${aiRules}

Sitemap: ${SITE_URL}/sitemap.xml

# AI/LLM Content
# llms.txt: ${SITE_URL}/llms.txt
# llms-full.txt: ${SITE_URL}/llms-full.txt
`;

  return new Response(robotsTxt, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "public, s-maxage=86400, stale-while-revalidate=3600",
    },
  });
}
```

### What to Disallow

- `/api/` — Internal API routes shouldn't be crawled
- `/_next/` — Next.js internal assets
- Any authenticated or admin routes
- Draft content endpoints

---

## 5. Structured Data (JSON-LD)

### Why It Matters for AI

JSON-LD tells AI engines what your content *means*, not just what it says. When an LLM encounters `"@type": "BlogPosting"`, it knows this is an article with a specific author, date, and topic — enabling proper citation and attribution.

### Schema Types to Implement

| Page Type | Schema | Key Fields |
|-----------|--------|------------|
| Root layout | `WebSite` + `Person` | name, url, description, sameAs |
| Blog posts | `BlogPosting` + `BreadcrumbList` | headline, datePublished, author, keywords |
| Portfolio/projects | `CreativeWork` | name, description, url, creator |
| FAQ pages | `FAQPage` | mainEntity with Question/Answer pairs |
| About page | `Person` or `Organization` | name, jobTitle, knowsAbout, sameAs |

### BlogPosting Example

```tsx
const blogPostJsonLd = {
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  headline: post.title,
  description: post.excerpt,
  datePublished: post.date,
  dateModified: post.lastModified || post.date,
  author: {
    "@type": "Person",
    name: post.author,
    url: SITE_URL,
  },
  publisher: {
    "@type": "Person",
    name: "Your Name",
    url: SITE_URL,
  },
  url: `${SITE_URL}/blog/${post.slug}`,
  mainEntityOfPage: `${SITE_URL}/blog/${post.slug}`,
  keywords: post.tags.join(", "),
  inLanguage: "en-US",
};
```

### CreativeWork Example (for Portfolio Projects)

```tsx
const projectJsonLd = {
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  name: project.title,
  description: project.description,
  url: project.url,
  image: `${SITE_URL}${project.image}`,
  keywords: project.tags.join(", "),
  creator: {
    "@type": "Person",
    name: "Your Name",
    url: SITE_URL,
  },
  mainEntityOfPage: `${SITE_URL}/work/${project.slug}`,
};
```

### BreadcrumbList

Add breadcrumbs to every content page. AI engines use these to understand your site hierarchy:

```tsx
const breadcrumbJsonLd = {
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  itemListElement: [
    { "@type": "ListItem", position: 1, name: "Home", item: SITE_URL },
    { "@type": "ListItem", position: 2, name: "Blog", item: `${SITE_URL}/blog` },
    { "@type": "ListItem", position: 3, name: post.title, item: `${SITE_URL}/blog/${post.slug}` },
  ],
};
```

### Rendering in Next.js

```tsx
<script
  type="application/ld+json"
  dangerouslySetInnerHTML={{ __html: JSON.stringify(jsonLd) }}
/>
```

---

## 6. JSON Feed

### What It Is

[JSON Feed](https://www.jsonfeed.org/) (v1.1) is a feed format that uses JSON instead of XML. It's simpler to parse programmatically and more natural for AI agents.

### Next.js Implementation

Create `src/app/feed.json/route.ts`:

```typescript
import { getAllPosts } from "@/lib/blog/utils";

const SITE_URL = "https://yoursite.com";

export async function GET() {
  const posts = await getAllPosts();
  const publishedPosts = posts.filter((p) => p.status === "published");

  const feed = {
    version: "https://jsonfeed.org/version/1.1",
    title: "Your Site - Blog",
    home_page_url: SITE_URL,
    feed_url: `${SITE_URL}/feed.json`,
    description: "Your site description.",
    language: "en-US",
    authors: [{ name: "Your Name", url: SITE_URL }],
    items: publishedPosts.map((post) => ({
      id: `${SITE_URL}/blog/${post.slug}`,
      url: `${SITE_URL}/blog/${post.slug}`,
      title: post.title,
      summary: post.excerpt,
      date_published: new Date(post.date).toISOString(),
      authors: [{ name: post.author }],
      tags: post.tags,
      ...(post.coverImage && { image: post.coverImage }),
    })),
  };

  return new Response(JSON.stringify(feed, null, 2), {
    headers: {
      "Content-Type": "application/feed+json; charset=utf-8",
      "Cache-Control": "public, s-maxage=3600, stale-while-revalidate=600",
    },
  });
}
```

### Register in Layout Metadata

```typescript
alternates: {
  types: {
    "application/rss+xml": [{ url: "/feed.xml", title: "RSS Feed" }],
    "application/feed+json": [{ url: "/feed.json", title: "JSON Feed" }],
  },
},
```

---

## 7. Sitemap Optimization

### Dynamic Generation

Generate your sitemap from actual content data, not a static file:

```typescript
import { MetadataRoute } from "next";
import { getAllPosts } from "@/lib/blog/utils";

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const posts = await getAllPosts();
  const publishedPosts = posts.filter((p) => p.status === "published");

  return [
    { url: "https://yoursite.com", priority: 1.0, changeFrequency: "weekly" },
    { url: "https://yoursite.com/blog", priority: 0.9, changeFrequency: "weekly" },
    ...publishedPosts.map((post) => ({
      url: `https://yoursite.com/blog/${post.slug}`,
      lastModified: new Date(post.date),
      priority: 0.7,
      changeFrequency: "monthly" as const,
    })),
  ];
}
```

### AEO/GEO Considerations

- Set `lastModified` accurately — AI engines prefer fresh content
- Higher `priority` for pages you want cited most
- Include all public content (don't hide pages from the sitemap that you want AI to find)

---

## 8. Content Writing for AI Engines

These writing patterns help your content get cited by AI:

### Lead with the Answer

The first paragraph should directly answer the question implied by the title. AI engines and featured snippets pull from the opening text.

```markdown
## What are design tokens?

Design tokens are the atomic building blocks of a design system — named
values that store visual properties like colors, spacing, and typography.
They bridge the gap between design and code.
```

### Use Q&A Headings

Phrase headings as questions when natural. This directly maps to how users query AI engines:

- "What are design tokens?" instead of "Design Tokens Overview"
- "How do I set up remote development on iPad?" instead of "iPad Setup"
- "Why does motion design matter for AI products?" instead of "Motion Design"

### Include TL;DR Summaries

Add a summary near the top of longer posts. AI engines often cite concise summaries:

```markdown
**TL;DR:** Design tokens are named values (colors, spacing, fonts) that
keep design consistent across platforms. Use them to create a single source
of truth shared between designers and developers.
```

### Use Lists and Tables

Structured content is heavily favored by both Google snippets and AI citation:

```markdown
| Token Type | Example | Purpose |
|-----------|---------|---------|
| Color | `--color-primary: #0066cc` | Brand consistency |
| Spacing | `--space-md: 16px` | Layout rhythm |
| Typography | `--font-body: Inter` | Text consistency |
```

### Establish Author Expertise

Mention relevant experience where natural. This builds E-E-A-T signals for both Google and AI citation:

> "As a design engineer with 15+ years of experience building products for companies like Adobe and UKG..."

### Internal and External Links

- **Internal links** help AI understand your content graph. Link between related posts.
- **External authority links** to well-known sources build trust signals.

---

## 9. Meta Tags & OpenGraph

### Every Page Needs

```typescript
export const metadata: Metadata = {
  title: "Page Title",
  description: "120-160 character description with primary keyword.",
  alternates: { canonical: "/page-path" },
  openGraph: {
    type: "article", // or "website" for non-article pages
    title: "Page Title",
    description: "Description",
    url: "https://yoursite.com/page-path",
    images: [{ url: "/og-image.png", alt: "Description" }],
  },
  twitter: {
    card: "summary_large_image",
    title: "Page Title",
    description: "Description",
  },
};
```

### Blog Posts Additionally Need

```typescript
openGraph: {
  type: "article",
  publishedTime: post.date,
  modifiedTime: post.lastModified,
  authors: [post.author],
  tags: post.tags,
},
```

### Root Layout Robots Directive

```typescript
robots: {
  index: true,
  follow: true,
  googleBot: {
    index: true,
    follow: true,
    "max-video-preview": -1,
    "max-image-preview": "large",
    "max-snippet": -1,
  },
},
```

Setting `max-snippet: -1` allows Google (and AI engines) to use as much of your content as they want in snippets and responses.

---

## 10. RSS Feed

### Implementation

```typescript
// src/app/feed.xml/route.ts
import { getAllPosts } from "@/lib/blog/utils";

const SITE_URL = "https://yoursite.com";

export async function GET() {
  const posts = await getAllPosts();
  const published = posts.filter((p) => p.status === "published");

  const items = published.map((post) => `
    <item>
      <title><![CDATA[${post.title}]]></title>
      <link>${SITE_URL}/blog/${post.slug}</link>
      <guid isPermaLink="true">${SITE_URL}/blog/${post.slug}</guid>
      <description><![CDATA[${post.excerpt}]]></description>
      <pubDate>${new Date(post.date).toUTCString()}</pubDate>
      <author>you@yoursite.com (Your Name)</author>
      ${post.tags.map((tag) => `<category>${tag}</category>`).join("\n      ")}
    </item>`).join("");

  const feed = `<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Your Site - Blog</title>
    <link>${SITE_URL}/blog</link>
    <description>Your blog description.</description>
    <language>en-US</language>
    <lastBuildDate>${new Date().toUTCString()}</lastBuildDate>
    <atom:link href="${SITE_URL}/feed.xml" rel="self" type="application/rss+xml" />
    ${items}
  </channel>
</rss>`;

  return new Response(feed, {
    headers: {
      "Content-Type": "application/xml",
      "Cache-Control": "public, s-maxage=3600, stale-while-revalidate=600",
    },
  });
}
```

### Auto-Discovery

Register in your layout metadata:

```typescript
alternates: {
  types: {
    "application/rss+xml": [{ url: "/feed.xml", title: "RSS Feed" }],
  },
},
```

---

## Complete Checklist

### Foundation (SEO)

- [ ] Dynamic `sitemap.xml` with all public pages
- [ ] `robots.txt` allowing all crawlers you want
- [ ] Canonical URLs on every page (`alternates.canonical`)
- [ ] `metadataBase` set in root layout
- [ ] OpenGraph tags on every page
- [ ] Twitter Card tags on every page
- [ ] Proper heading hierarchy (one H1, H2s for sections)
- [ ] Descriptive image alt text
- [ ] Internal linking between content

### Structured Data (SEO + AEO)

- [ ] `WebSite` + `Person`/`Organization` schema on root layout
- [ ] `BlogPosting` schema on blog posts
- [ ] `BreadcrumbList` schema on content pages
- [ ] `CreativeWork` schema on portfolio/project pages
- [ ] `FAQPage` schema on FAQ sections (if applicable)

### Machine-Readable Content (GEO)

- [ ] `/llms.txt` endpoint with site index
- [ ] `/llms-full.txt` endpoint with full content
- [ ] Inline `<script type="text/llms.txt">` in root layout
- [ ] RSS feed at `/feed.xml`
- [ ] JSON Feed at `/feed.json`
- [ ] Feed auto-discovery in metadata

### AI Crawler Access (GEO)

- [ ] `GPTBot` allowed in robots.txt
- [ ] `ChatGPT-User` allowed
- [ ] `Google-Extended` allowed
- [ ] `ClaudeBot` / `anthropic-ai` allowed
- [ ] `PerplexityBot` allowed
- [ ] `Applebot-Extended` allowed
- [ ] `Bytespider` allowed
- [ ] `cohere-ai` allowed
- [ ] `llms.txt` referenced in robots.txt

### Content Quality (AEO)

- [ ] First paragraph answers the title's implied question
- [ ] Headings phrased as questions where natural
- [ ] TL;DR / summary near top of long posts
- [ ] Lists and tables for structured information
- [ ] Author expertise context included
- [ ] External authority links for cited facts
- [ ] Evergreen content updated with `lastModified`

---

## Verification

Test your implementation with these commands:

```bash
# llms.txt — should return markdown index
curl https://yoursite.com/llms.txt

# llms-full.txt — should return full content
curl https://yoursite.com/llms-full.txt

# Check content type headers
curl -I https://yoursite.com/llms.txt
# Expected: Content-Type: text/plain; charset=utf-8

# robots.txt — should include AI bot rules and llms.txt references
curl https://yoursite.com/robots.txt

# JSON Feed — should return valid JSON
curl https://yoursite.com/feed.json | python3 -m json.tool

# Validate structured data
# Visit: https://validator.schema.org/
# Or: https://search.google.com/test/rich-results

# Check that inline LLM instructions appear in HTML source
curl -s https://yoursite.com | grep 'text/llms.txt'
```

---

## Resources

- [llms.txt spec](https://llmstxt.org) — The standard for machine-readable site indexes
- [Vercel: Add llms.txt](https://vercel.com/academy/agent-friendly-apis/add-llms-txt) — Implementation guide
- [Vercel: Inline LLM Instructions](https://vercel.com/blog/a-proposal-for-inline-llm-instructions-in-html) — The `<script type="text/llms.txt">` proposal
- [JSON Feed spec](https://www.jsonfeed.org/version/1.1/) — JSON Feed 1.1 specification
- [Schema.org](https://schema.org) — Structured data vocabulary
- [Google Rich Results Test](https://search.google.com/test/rich-results) — Validate your structured data
- [Google E-E-A-T guidelines](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Content quality signals
