Home/Tools/@tpmjs/tools-evals-blah

createEval

@tpmjs/tools-evals-blah

Create a new evaluation definition on evals.blah.dev. Requires API key.

Official
research
v0.1.0
MIT
⚠️

This tool is currently broken

Execution Failed
Runtime error with test parameters
eval_type must be "rubric" or "semantic"

Last checked: 3/1/2026, 4:27:41 AM

Interactive Playground

Test @tpmjs/tools-evals-blah (createEval) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-evals-blah
pnpm add @tpmjs/tools-evals-blah
yarn add @tpmjs/tools-evals-blah
bun add @tpmjs/tools-evals-blah
deno add npm:@tpmjs/tools-evals-blah

2. Import the tool

import { createEval } from '@tpmjs/tools-evals-blah';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { createEval } from '@tpmjs/tools-evals-blah';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { createEval },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Signature

(name: string, prompt: string, eval_type: string, eval_criteria: string, description?: string, expected_behavior?: string) => Promise<unknown>

Tags

ai
api
benchmarks
blah
create
definition
dev
eval
evals
evaluation
key
leaderboard
llm
new
requires
research
tpmjs

Parameters

Available configuration options

Auto-extracted
name
Required
Type: string

Eval name (1-100 characters)

prompt
Required
Type: string

The prompt that will be sent to models

eval_type
Required
Type: string

"rubric" scores against a detailed rubric; "semantic" compares to an ideal response

eval_criteria
Required
Type: string

JSON string of scoring criteria. For rubric: {"rubric": "...", "max_score": 1}. For semantic: {"ideal_response": "...", "rubric": "..."}.

description
Optional
Type: string

Optional eval description (max 1000 characters)

expected_behavior
Optional
Type: string

Optional human-readable description of expected behavior (max 2000 characters)

Schema extracted: 3/1/2026, 4:27:40 AM

README

@tpmjs/tools-evals-blah

AI SDK tools for evals.blah.dev — the open LLM evaluation platform. Register models, create evals, trigger runs, and check the leaderboard.

Installation

npm install @tpmjs/tools-evals-blah

Setup

Read-only tools (list, get, leaderboard) require no authentication.

For write operations (create model, create eval, trigger run), set your API key:

export EVALS_BLAH_API_KEY=blah_your_api_key_here

Get an API key at https://evals.blah.dev/settings/api-keys

Usage

import {
  listModels,
  getLeaderboard,
  createModel,
  createEval,
  triggerRun,
} from '@tpmjs/tools-evals-blah';

// List all models (no auth needed)
const models = await listModels.execute({});

// Check the leaderboard (no auth needed)
const leaderboard = await getLeaderboard.execute({});

// Register a model (requires API key)
const model = await createModel.execute({
  name: 'My Model',
  inference_uri: 'openai/gpt-4.1-mini',
});

// Create an eval (requires API key)
const eval = await createEval.execute({
  name: 'Code Clarity',
  prompt: 'Write a function to reverse a string',
  eval_type: 'rubric',
  eval_criteria: '{"rubric": "Rate code clarity 0-1", "max_score": 1}',
});

// Trigger a run (requires API key)
const run = await triggerRun.execute({});

Tools

ToolAuthDescription
listModelsNoList all registered LLM models
getModelNoGet a model by ID
createModelYesRegister a new model
getModelResultsNoGet all eval results for a model
listEvalsNoList all evaluation definitions
getEvalNoGet an eval by ID
createEvalYesCreate a new evaluation
listRunsNoList all eval runs
getRunNoGet a run by ID
getRunResultsNoGet all results for a run
triggerRunYesTrigger a new eval run
getResultNoGet a single result by ID
getLeaderboardNoGet model rankings

License

MIT

Statistics

Downloads/month

0

Quality Score

0%

Bundle Size

NPM Keywords

tpmjs
research
ai
evals
llm
leaderboard
benchmarks

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai