Home/Tools/@tpmjs/tools-sitemap-read

sitemapReadTool

@tpmjs/tools-sitemap-read

Parse XML sitemaps (sitemap.xml) and extract URLs. Handles both regular sitemaps (urlset) and sitemap indexes (sitemapindex). Returns URL locations with optional metadata like lastmod, changefreq, and priority. Useful for discovering pages on a website, SEO analysis, and crawling planning.

Official
web
v0.2.0
MIT

Interactive Playground

Test @tpmjs/tools-sitemap-read (sitemapReadTool) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-sitemap-read
pnpm add @tpmjs/tools-sitemap-read
yarn add @tpmjs/tools-sitemap-read
bun add @tpmjs/tools-sitemap-read
deno add npm:@tpmjs/tools-sitemap-read

2. Import the tool

import { sitemapReadTool } from '@tpmjs/tools-sitemap-read';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { sitemapReadTool } from '@tpmjs/tools-sitemap-read';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { sitemapReadTool },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Parameters

Available configuration options

Auto-extracted
url
Required
Type: string

The sitemap.xml URL to parse (must be http or https)

Schema extracted: 1/1/2026, 8:18:36 AM

README

@tpmjs/tools-sitemap-read

Parse XML sitemaps and extract URLs from sitemap.xml files.

Installation

npm install @tpmjs/tools-sitemap-read

Usage

import { sitemapReadTool } from '@tpmjs/tools-sitemap-read';
import { generateText } from 'ai';

const result = await generateText({
  model: yourModel,
  tools: { sitemapReadTool },
  prompt: 'Get all URLs from https://example.com/sitemap.xml',
});

Tool Parameters

  • url (string, required): The sitemap.xml URL to parse

Returns

{
  urls: Array<{
    loc: string;
    lastmod?: string;
    changefreq?: string;
    priority?: string;
  }>;
  isSitemapIndex: boolean;
  urlCount: number;
  sitemapIndexUrls?: Array<{
    loc: string;
    lastmod?: string;
  }>;
  metadata: {
    fetchedAt: string;
    sourceUrl: string;
    type: 'urlset' | 'sitemapindex';
  };
}

Features

  • Supports both regular sitemaps (urlset) and sitemap indexes (sitemapindex)
  • Extracts URL locations with optional metadata (lastmod, changefreq, priority)
  • Handles sitemap index files that reference other sitemaps
  • Comprehensive error handling
  • 30-second timeout protection
  • Validates XML structure

Sitemap Types

Regular Sitemap (urlset)

Contains direct page URLs with optional metadata:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-01-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Sitemap Index (sitemapindex)

Contains references to other sitemap files:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap1.xml</loc>
    <lastmod>2024-01-01</lastmod>
  </sitemap>
</sitemapindex>

Requirements

  • Node.js 18+ (uses native fetch API)

License

MIT

Statistics

Downloads/month

47

Quality Score

78%

Bundle Size

NPM Keywords

tpmjs
sitemap
xml
seo
crawler
web

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai