Home/Tools/@tpmjs/tools-judge

judgeConversation

@tpmjs/tools-judge

Evaluate an AI conversation across 10 quality metrics to ensure the AI is making real progress and completing user intent. USE THIS TOOL FREQUENTLY in agentic loops to: - Verify the AI is on track before moving to the next step - Catch loops, stuck states, or regressions early - Ensure task completion before declaring success - Get actionable must-dos and improvement suggestions The 10 metrics evaluated: 1. Task Completion - Did the AI complete what was asked? 2. Accuracy - Are responses correct and error-free? 3. Relevance - Are responses on-topic? 4. Clarity - Are responses clear and understandable? 5. Efficiency - Is the AI being concise? 6. User Intent Alignment - Does the AI understand the user? 7. Actionability - Are outputs usable? 8. Progress - Is the conversation moving forward? 9. Error Handling - Are errors handled gracefully? 10. Completeness - Are all aspects addressed? Returns a verdict (pass/retry/fail) with specific must-dos for any issues.

Official
agent
v0.1.0
MIT

Interactive Playground

Test @tpmjs/tools-judge (judgeConversation) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-judge
pnpm add @tpmjs/tools-judge
yarn add @tpmjs/tools-judge
bun add @tpmjs/tools-judge
deno add npm:@tpmjs/tools-judge

2. Import the tool

import { judgeConversation } from '@tpmjs/tools-judge';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { judgeConversation } from '@tpmjs/tools-judge';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { judgeConversation },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Parameters

Available configuration options

Auto-extracted
messages
Required
Type: array

Array of AI SDK messages to evaluate. Each message should have role and content.

originalUserRequest
Optional
Type: string

Optional: The original user request if different from first message

context
Optional
Type: string

Optional: Additional context about what the conversation should accomplish

strictMode
Optional
Type: boolean

Optional: If true, requires higher scores to pass (default: false)

Schema extracted: 1/17/2026, 4:51:37 AM

README

ERROR: No README data found!

Statistics

Downloads/month

0

Quality Score

0%

Bundle Size

NPM Keywords

tpmjs
judge
ai
evaluation
quality
metrics
agent
conversation

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai