@tpmjs/tools-robots-policy
Parse robots.txt and check if a URL is allowed for crawling
Test @tpmjs/tools-robots-policy (robotsPolicyTool) with AI-powered execution
0/2000 characters
Install this tool and use it with the AI SDK
npm install @tpmjs/tools-robots-policypnpm add @tpmjs/tools-robots-policyyarn add @tpmjs/tools-robots-policybun add @tpmjs/tools-robots-policydeno add npm:@tpmjs/tools-robots-policyimport { robotsPolicyTool } from '@tpmjs/tools-robots-policy';import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { robotsPolicyTool } from '@tpmjs/tools-robots-policy';
const result = await generateText({
model: openai('gpt-4o'),
tools: { robotsPolicyTool },
prompt: 'Your prompt here...',
});
console.log(result.text);(testUrl: string, robotsUrl: string, userAgent?: string) => Promise<unknown>Available configuration options
robotsUrlstringThe robots.txt URL to parse
testUrlstringThe URL path to test for crawl permission
userAgentstringThe user agent to check (default: '*')
Schema extracted: 2/28/2026, 3:58:32 AM
Parse robots.txt files and check if URLs are allowed for crawling.
npm install @tpmjs/tools-robots-policy
import { robotsPolicyTool } from '@tpmjs/tools-robots-policy'; import { generateText } from 'ai'; const result = await generateText({ model: yourModel, tools: { robotsPolicyTool }, prompt: 'Check if https://example.com/admin is allowed to be crawled', });
robotsUrl (string, required): The robots.txt URL to parse (e.g., https://example.com/robots.txt)testUrl (string, required): The full URL to test for crawl permissionuserAgent (string, optional): The user agent to check (default: "*" for all bots){ allowed: boolean; userAgent: string; testUrl: string; matchedRule?: { directive: 'allow' | 'disallow'; path: string; }; rules: Array<{ userAgent: string; rules: Array<{ directive: 'allow' | 'disallow'; path: string; }>; crawlDelay?: number; sitemaps?: string[]; }>; crawlDelay?: number; sitemaps: string[]; metadata: { fetchedAt: string; robotsUrl: string; hasRules: boolean; }; }
*) and end-of-URL markers ($)The tool supports standard robots.txt patterns:
* - Matches any sequence of characters$ - Matches the end of the URLDisallow: /admin # Blocks /admin and /admin/*
Disallow: /admin/ # Blocks /admin/* but not /admin
Disallow: /*.pdf$ # Blocks all PDF files
Disallow: /private*.html # Blocks files starting with "private" and ending in .html
Allow: /public # Explicitly allows /public and /public/*
Common user agent strings:
* - All robots (default)googlebot - Google's crawlerbingbot - Bing's crawlerslurp - Yahoo's crawlerThe tool matches user agents case-insensitively and uses the most specific rule available (exact match > wildcard).
MIT
Downloads/month
9
GitHub Stars
0
Quality Score