@tpmjs/tools-judge
Evaluate an AI conversation across 10 quality metrics to ensure the AI is making real progress and completing user intent. USE THIS TOOL FREQUENTLY in agentic loops to: - Verify the AI is on track before moving to the next step - Catch loops, stuck states, or regressions early - Ensure task completion before declaring success - Get actionable must-dos and improvement suggestions The 10 metrics evaluated: 1. Task Completion - Did the AI complete what was asked? 2. Accuracy - Are responses correct and error-free? 3. Relevance - Are responses on-topic? 4. Clarity - Are responses clear and understandable? 5. Efficiency - Is the AI being concise? 6. User Intent Alignment - Does the AI understand the user? 7. Actionability - Are outputs usable? 8. Progress - Is the conversation moving forward? 9. Error Handling - Are errors handled gracefully? 10. Completeness - Are all aspects addressed? Returns a verdict (pass/retry/fail) with specific must-dos for any issues.
Test @tpmjs/tools-judge (judgeConversation) with AI-powered execution
0/2000 characters
Install this tool and use it with the AI SDK
npm install @tpmjs/tools-judgepnpm add @tpmjs/tools-judgeyarn add @tpmjs/tools-judgebun add @tpmjs/tools-judgedeno add npm:@tpmjs/tools-judgeimport { judgeConversation } from '@tpmjs/tools-judge';import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { judgeConversation } from '@tpmjs/tools-judge';
const result = await generateText({
model: openai('gpt-4o'),
tools: { judgeConversation },
prompt: 'Your prompt here...',
});
console.log(result.text);Available configuration options
messagesarrayArray of AI SDK messages to evaluate. Each message should have role and content.
originalUserRequeststringOptional: The original user request if different from first message
contextstringOptional: Additional context about what the conversation should accomplish
strictModebooleanOptional: If true, requires higher scores to pass (default: false)
Schema extracted: 1/17/2026, 4:51:37 AM
ERROR: No README data found!
Downloads/month
0
Quality Score