Home/Tools/@tpmjs/tools-robots-policy

robotsPolicyTool

@tpmjs/tools-robots-policy

Parse robots.txt and check if a URL is allowed for crawling

Official
web
v0.2.0
MIT

Interactive Playground

Test @tpmjs/tools-robots-policy (robotsPolicyTool) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-robots-policy
pnpm add @tpmjs/tools-robots-policy
yarn add @tpmjs/tools-robots-policy
bun add @tpmjs/tools-robots-policy
deno add npm:@tpmjs/tools-robots-policy

2. Import the tool

import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { robotsPolicyTool },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Signature

(testUrl: string, robotsUrl: string, userAgent?: string) => Promise<unknown>

Tags

allowed
check
crawler
crawling
parse
policy
robots
robots.txt
seo
tpmjs
txt
url
web

Parameters

Available configuration options

Auto-extracted
robotsUrl
Required
Type: string

The robots.txt URL to parse

testUrl
Required
Type: string

The URL path to test for crawl permission

userAgent
Optional
Type: string

The user agent to check (default: '*')

Schema extracted: 2/28/2026, 3:58:32 AM

README

@tpmjs/tools-robots-policy

Parse robots.txt files and check if URLs are allowed for crawling.

Installation

npm install @tpmjs/tools-robots-policy

Usage

import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';
import { generateText } from 'ai';

const result = await generateText({
  model: yourModel,
  tools: { robotsPolicyTool },
  prompt: 'Check if https://example.com/admin is allowed to be crawled',
});

Tool Parameters

  • robotsUrl (string, required): The robots.txt URL to parse (e.g., https://example.com/robots.txt)
  • testUrl (string, required): The full URL to test for crawl permission
  • userAgent (string, optional): The user agent to check (default: "*" for all bots)

Returns

{
  allowed: boolean;
  userAgent: string;
  testUrl: string;
  matchedRule?: {
    directive: 'allow' | 'disallow';
    path: string;
  };
  rules: Array<{
    userAgent: string;
    rules: Array<{
      directive: 'allow' | 'disallow';
      path: string;
    }>;
    crawlDelay?: number;
    sitemaps?: string[];
  }>;
  crawlDelay?: number;
  sitemaps: string[];
  metadata: {
    fetchedAt: string;
    robotsUrl: string;
    hasRules: boolean;
  };
}

Features

  • Parses robots.txt files according to the robots exclusion standard
  • Checks if specific URLs are allowed for crawling
  • Supports user-agent specific rules
  • Extracts crawl-delay directives
  • Extracts sitemap URLs
  • Handles wildcards (*) and end-of-URL markers ($)
  • Returns the most specific matching rule
  • 30-second timeout protection
  • Handles missing robots.txt (treats as "allow all")

Pattern Matching

The tool supports standard robots.txt patterns:

  • * - Matches any sequence of characters
  • $ - Matches the end of the URL
  • Patterns are case-sensitive for paths

Examples

Disallow: /admin          # Blocks /admin and /admin/*
Disallow: /admin/         # Blocks /admin/* but not /admin
Disallow: /*.pdf$         # Blocks all PDF files
Disallow: /private*.html  # Blocks files starting with "private" and ending in .html
Allow: /public            # Explicitly allows /public and /public/*

User Agents

Common user agent strings:

  • * - All robots (default)
  • googlebot - Google's crawler
  • bingbot - Bing's crawler
  • slurp - Yahoo's crawler
  • Custom user agents for specific bots

The tool matches user agents case-insensitively and uses the most specific rule available (exact match > wildcard).

Requirements

  • Node.js 18+ (uses native fetch API)

License

MIT

Statistics

Downloads/month

9

GitHub Stars

0

Quality Score

74%

Bundle Size

NPM Keywords

tpmjs
robots
robots.txt
crawler
seo
web

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai
robotsPolicyTool | TPMJS | TPMJS