@tpmjs/tools-robots-policy
Parse a robots.txt file and check if a specific URL is allowed to be crawled by a given user agent. Returns whether the URL is allowed, the matching rule, all parsed rules, crawl delay, and sitemap URLs. Useful for respecting website crawling policies and understanding access restrictions.
Test @tpmjs/tools-robots-policy (robotsPolicyTool) with AI-powered execution
0/2000 characters
Install this tool and use it with the AI SDK
npm install @tpmjs/tools-robots-policypnpm add @tpmjs/tools-robots-policyyarn add @tpmjs/tools-robots-policybun add @tpmjs/tools-robots-policydeno add npm:@tpmjs/tools-robots-policyimport { robotsPolicyTool } from '@tpmjs/tools-robots-policy';import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';
const result = await generateText({
model: openai('gpt-4o'),
tools: { robotsPolicyTool },
prompt: 'Your prompt here...',
});
console.log(result.text);Available configuration options
robotsUrlstringThe robots.txt URL to parse (usually https://example.com/robots.txt)
testUrlstringThe full URL to test for crawl permission
userAgentstringThe user agent to check permissions for (default: "*" for all bots). Common values: "googlebot", "bingbot", "*"
Schema extracted: 1/1/2026, 8:18:28 AM
Parse robots.txt files and check if URLs are allowed for crawling.
npm install @tpmjs/tools-robots-policy
import { robotsPolicyTool } from '@tpmjs/tools-robots-policy'; import { generateText } from 'ai'; const result = await generateText({ model: yourModel, tools: { robotsPolicyTool }, prompt: 'Check if https://example.com/admin is allowed to be crawled', });
robotsUrl (string, required): The robots.txt URL to parse (e.g., https://example.com/robots.txt)testUrl (string, required): The full URL to test for crawl permissionuserAgent (string, optional): The user agent to check (default: "*" for all bots){ allowed: boolean; userAgent: string; testUrl: string; matchedRule?: { directive: 'allow' | 'disallow'; path: string; }; rules: Array<{ userAgent: string; rules: Array<{ directive: 'allow' | 'disallow'; path: string; }>; crawlDelay?: number; sitemaps?: string[]; }>; crawlDelay?: number; sitemaps: string[]; metadata: { fetchedAt: string; robotsUrl: string; hasRules: boolean; }; }
*) and end-of-URL markers ($)The tool supports standard robots.txt patterns:
* - Matches any sequence of characters$ - Matches the end of the URLDisallow: /admin # Blocks /admin and /admin/*
Disallow: /admin/ # Blocks /admin/* but not /admin
Disallow: /*.pdf$ # Blocks all PDF files
Disallow: /private*.html # Blocks files starting with "private" and ending in .html
Allow: /public # Explicitly allows /public and /public/*
Common user agent strings:
* - All robots (default)googlebot - Google's crawlerbingbot - Bing's crawlerslurp - Yahoo's crawlerThe tool matches user agents case-insensitively and uses the most specific rule available (exact match > wildcard).
MIT
Downloads/month
42
Quality Score