@tpmjs/tools-evals-blah
Get the model leaderboard ranked by average eval score on evals.blah.dev.
Test @tpmjs/tools-evals-blah (getLeaderboard) with AI-powered execution
0/2000 characters
Install this tool and use it with the AI SDK
npm install @tpmjs/tools-evals-blahpnpm add @tpmjs/tools-evals-blahyarn add @tpmjs/tools-evals-blahbun add @tpmjs/tools-evals-blahdeno add npm:@tpmjs/tools-evals-blahimport { getLeaderboard } from '@tpmjs/tools-evals-blah';import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { getLeaderboard } from '@tpmjs/tools-evals-blah';
const result = await generateText({
model: openai('gpt-4o'),
tools: { getLeaderboard },
prompt: 'Your prompt here...',
});
console.log(result.text);() => Promise<unknown>Available configuration options
No schema available for this tool.
AI SDK tools for evals.blah.dev — the open LLM evaluation platform. Register models, create evals, trigger runs, and check the leaderboard.
npm install @tpmjs/tools-evals-blah
Read-only tools (list, get, leaderboard) require no authentication.
For write operations (create model, create eval, trigger run), set your API key:
export EVALS_BLAH_API_KEY=blah_your_api_key_here
Get an API key at https://evals.blah.dev/settings/api-keys
import { listModels, getLeaderboard, createModel, createEval, triggerRun, } from '@tpmjs/tools-evals-blah'; // List all models (no auth needed) const models = await listModels.execute({}); // Check the leaderboard (no auth needed) const leaderboard = await getLeaderboard.execute({}); // Register a model (requires API key) const model = await createModel.execute({ name: 'My Model', inference_uri: 'openai/gpt-4.1-mini', }); // Create an eval (requires API key) const eval = await createEval.execute({ name: 'Code Clarity', prompt: 'Write a function to reverse a string', eval_type: 'rubric', eval_criteria: '{"rubric": "Rate code clarity 0-1", "max_score": 1}', }); // Trigger a run (requires API key) const run = await triggerRun.execute({});
| Tool | Auth | Description |
|---|---|---|
listModels | No | List all registered LLM models |
getModel | No | Get a model by ID |
createModel | Yes | Register a new model |
getModelResults | No | Get all eval results for a model |
listEvals | No | List all evaluation definitions |
getEval | No | Get an eval by ID |
createEval | Yes | Create a new evaluation |
listRuns | No | List all eval runs |
getRun | No | Get a run by ID |
getRunResults | No | Get all results for a run |
triggerRun | Yes | Trigger a new eval run |
getResult | No | Get a single result by ID |
getLeaderboard | No | Get model rankings |
MIT
Downloads/month
0
Quality Score