Home/Tools/@tpmjs/tools-robots-policy

robotsPolicyTool

@tpmjs/tools-robots-policy

Parse a robots.txt file and check if a specific URL is allowed to be crawled by a given user agent. Returns whether the URL is allowed, the matching rule, all parsed rules, crawl delay, and sitemap URLs. Useful for respecting website crawling policies and understanding access restrictions.

Official
web
v0.2.0
MIT

Interactive Playground

Test @tpmjs/tools-robots-policy (robotsPolicyTool) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-robots-policy
pnpm add @tpmjs/tools-robots-policy
yarn add @tpmjs/tools-robots-policy
bun add @tpmjs/tools-robots-policy
deno add npm:@tpmjs/tools-robots-policy

2. Import the tool

import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { robotsPolicyTool },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Parameters

Available configuration options

Auto-extracted
robotsUrl
Required
Type: string

The robots.txt URL to parse (usually https://example.com/robots.txt)

testUrl
Required
Type: string

The full URL to test for crawl permission

userAgent
Optional
Type: string

The user agent to check permissions for (default: "*" for all bots). Common values: "googlebot", "bingbot", "*"

Schema extracted: 1/1/2026, 8:18:28 AM

README

@tpmjs/tools-robots-policy

Parse robots.txt files and check if URLs are allowed for crawling.

Installation

npm install @tpmjs/tools-robots-policy

Usage

import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';
import { generateText } from 'ai';

const result = await generateText({
  model: yourModel,
  tools: { robotsPolicyTool },
  prompt: 'Check if https://example.com/admin is allowed to be crawled',
});

Tool Parameters

  • robotsUrl (string, required): The robots.txt URL to parse (e.g., https://example.com/robots.txt)
  • testUrl (string, required): The full URL to test for crawl permission
  • userAgent (string, optional): The user agent to check (default: "*" for all bots)

Returns

{
  allowed: boolean;
  userAgent: string;
  testUrl: string;
  matchedRule?: {
    directive: 'allow' | 'disallow';
    path: string;
  };
  rules: Array<{
    userAgent: string;
    rules: Array<{
      directive: 'allow' | 'disallow';
      path: string;
    }>;
    crawlDelay?: number;
    sitemaps?: string[];
  }>;
  crawlDelay?: number;
  sitemaps: string[];
  metadata: {
    fetchedAt: string;
    robotsUrl: string;
    hasRules: boolean;
  };
}

Features

  • Parses robots.txt files according to the robots exclusion standard
  • Checks if specific URLs are allowed for crawling
  • Supports user-agent specific rules
  • Extracts crawl-delay directives
  • Extracts sitemap URLs
  • Handles wildcards (*) and end-of-URL markers ($)
  • Returns the most specific matching rule
  • 30-second timeout protection
  • Handles missing robots.txt (treats as "allow all")

Pattern Matching

The tool supports standard robots.txt patterns:

  • * - Matches any sequence of characters
  • $ - Matches the end of the URL
  • Patterns are case-sensitive for paths

Examples

Disallow: /admin          # Blocks /admin and /admin/*
Disallow: /admin/         # Blocks /admin/* but not /admin
Disallow: /*.pdf$         # Blocks all PDF files
Disallow: /private*.html  # Blocks files starting with "private" and ending in .html
Allow: /public            # Explicitly allows /public and /public/*

User Agents

Common user agent strings:

  • * - All robots (default)
  • googlebot - Google's crawler
  • bingbot - Bing's crawler
  • slurp - Yahoo's crawler
  • Custom user agents for specific bots

The tool matches user agents case-insensitively and uses the most specific rule available (exact match > wildcard).

Requirements

  • Node.js 18+ (uses native fetch API)

License

MIT

Statistics

Downloads/month

42

Quality Score

78%

Bundle Size

NPM Keywords

tpmjs
robots
robots.txt
crawler
seo
web

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai