Home/Tools/@tpmjs/tools-evals-blah

listEvals

@tpmjs/tools-evals-blah

List all evaluation definitions on evals.blah.dev.

Official
research
v0.1.0
MIT

Interactive Playground

Test @tpmjs/tools-evals-blah (listEvals) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-evals-blah
pnpm add @tpmjs/tools-evals-blah
yarn add @tpmjs/tools-evals-blah
bun add @tpmjs/tools-evals-blah
deno add npm:@tpmjs/tools-evals-blah

2. Import the tool

import { listEvals } from '@tpmjs/tools-evals-blah';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { listEvals } from '@tpmjs/tools-evals-blah';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { listEvals },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Signature

() => Promise<unknown>

Tags

ai
benchmarks
blah
definitions
dev
evals
evaluation
leaderboard
list
llm
research
tpmjs

Parameters

Available configuration options

Auto-extracted

No schema available for this tool.

README

@tpmjs/tools-evals-blah

AI SDK tools for evals.blah.dev — the open LLM evaluation platform. Register models, create evals, trigger runs, and check the leaderboard.

Installation

npm install @tpmjs/tools-evals-blah

Setup

Read-only tools (list, get, leaderboard) require no authentication.

For write operations (create model, create eval, trigger run), set your API key:

export EVALS_BLAH_API_KEY=blah_your_api_key_here

Get an API key at https://evals.blah.dev/settings/api-keys

Usage

import {
  listModels,
  getLeaderboard,
  createModel,
  createEval,
  triggerRun,
} from '@tpmjs/tools-evals-blah';

// List all models (no auth needed)
const models = await listModels.execute({});

// Check the leaderboard (no auth needed)
const leaderboard = await getLeaderboard.execute({});

// Register a model (requires API key)
const model = await createModel.execute({
  name: 'My Model',
  inference_uri: 'openai/gpt-4.1-mini',
});

// Create an eval (requires API key)
const eval = await createEval.execute({
  name: 'Code Clarity',
  prompt: 'Write a function to reverse a string',
  eval_type: 'rubric',
  eval_criteria: '{"rubric": "Rate code clarity 0-1", "max_score": 1}',
});

// Trigger a run (requires API key)
const run = await triggerRun.execute({});

Tools

ToolAuthDescription
listModelsNoList all registered LLM models
getModelNoGet a model by ID
createModelYesRegister a new model
getModelResultsNoGet all eval results for a model
listEvalsNoList all evaluation definitions
getEvalNoGet an eval by ID
createEvalYesCreate a new evaluation
listRunsNoList all eval runs
getRunNoGet a run by ID
getRunResultsNoGet all results for a run
triggerRunYesTrigger a new eval run
getResultNoGet a single result by ID
getLeaderboardNoGet model rankings

License

MIT

Statistics

Downloads/month

0

Quality Score

0%

Bundle Size

NPM Keywords

tpmjs
research
ai
evals
llm
leaderboard
benchmarks

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai
listEvals | TPMJS | TPMJS