Scenarios

Scenarios are AI-generated test cases for your tool collections. They automatically verify that your tools work correctly and provide quality metrics over time.

What are Scenarios?

Automated testing for tool collections

A Scenario is a test case that exercises your tool collection with a realistic user prompt. When you run a scenario:

  1. An ephemeral agent is created with your collection's tools
  2. The agent executes the scenario's prompt
  3. An LLM evaluates whether the task was completed successfully
  4. Results are recorded and quality scores are updated

This enables continuous testing of your tool collections, similar to unit tests for code.

Common Archetypes

Typical scenarios for different tool types

Web Scraping Tools

Test data extraction from various webpage structures:

"Scrape the main heading and first paragraph from https://example.com"
"Extract all links from a news article page"
"Get the price and description from an e-commerce product page"

Search Tools

Verify search accuracy and relevance:

"Search for recent news about climate change and summarize the top 3 results"
"Find documentation for the React useState hook"
"Search for restaurants near Times Square, New York"

Data Processing Tools

Test transformations and calculations:

"Convert 100 USD to EUR using current exchange rates"
"Parse this JSON and extract the user email addresses"
"Calculate the average of these numbers: 10, 20, 30, 40, 50"

File/Document Tools

Verify file operations and content extraction:

"Generate a PDF report with the title 'Monthly Summary'"
"Extract text content from a markdown document"
"Create a CSV file with sample user data"

Getting Started

Install the CLI and authenticate

Install the TPMJS CLI

npm install -g @tpmjs/cli

Authenticate

You need a TPMJS API key to run scenarios. Get one from your dashboard settings.

tpm auth login

Generate Scenarios

tpm scenario generate

AI-generate test scenarios based on your collection's tools. The generator analyzes your tools and creates realistic prompts.

Basic Usage

# Generate 1 scenario (default)
tpm scenario generate my-collection

# Generate multiple scenarios
tpm scenario generate my-collection --count 5

# Skip similarity check (allow duplicates)
tpm scenario generate my-collection --count 3 --skip-similarity

Example Output

Generating 3 scenarios for "Web Scraping Toolkit"...

✓ Generated 3 scenarios:

  1. "Scrape the main article content from a news website and extract..."
     Similarity: 0% (unique)
     Tags: web-scraping, content-extraction

  2. "Extract all image URLs from an e-commerce product gallery..."
     Similarity: 15% (unique)
     Tags: web-scraping, images, e-commerce

  3. "Get the current weather data from a weather service page..."
     Similarity: 8% (unique)
     Tags: web-scraping, weather, data-extraction

Use 'tpm scenario list my-collection' to view all scenarios.

Similarity Detection

Generated scenarios are checked against existing ones using vector similarity. If a scenario is >70% similar to an existing one, you'll see a warning. This helps maintain diverse test coverage.

List Scenarios

tpm scenario list

View all scenarios for a collection or browse public scenarios.

Usage

# List scenarios for a specific collection
tpm scenario list my-collection

# List all public scenarios
tpm scenario list

# With pagination
tpm scenario list --limit 50 --offset 20

# Filter by tags
tpm scenario list --tags web-scraping,api

# Output as JSON
tpm scenario list my-collection --json

Example Output

Scenarios for Web Scraping Toolkit

Name                                Quality   Runs   Status   Tags
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scrape main article content...       85%      12     pass     web-scraping
Extract image URLs from gallery      92%       8     pass     images, e-commerce
Get weather data from service...     45%       5     fail     weather, data

Showing 3 scenario(s)

Run All Scenarios

tpm scenario run

Execute all scenarios for a collection. This is ideal for CI/CD pipelines or batch testing.

Usage

# Run all scenarios for a collection
tpm scenario run my-collection

# Verbose output with detailed progress
tpm scenario run my-collection --verbose

# JSON output for CI integration
tpm scenario run my-collection --json

Example Output

Running 3 scenarios for "Web Scraping Toolkit"...

[1/3] Scrape main article content...
      ✓ PASSED (2.3s) - Successfully extracted article content

[2/3] Extract image URLs from gallery
      ✓ PASSED (1.8s) - Found and returned 12 image URLs

[3/3] Get weather data from service...
      ✗ FAILED (3.1s) - Weather service returned 403 error

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Results: 2 passed, 1 failed, 0 errors
Total time: 7.2s
Quota remaining: 47 runs/day

Exit Codes

  • 0 - All scenarios passed
  • 1 - One or more scenarios failed

Test Single Scenario

tpm scenario test

Run a single scenario by its ID. Useful for debugging or re-testing specific failures.

Usage

# Run a single scenario
tpm scenario test clu123abc456

# With verbose output
tpm scenario test clu123abc456 --verbose

# JSON output
tpm scenario test clu123abc456 --json

Example Output

Scenario: Scrape main article content from a news website
Collection: Web Scraping Toolkit

✓ Scenario PASSED

Results
  Status:    completed
  Verdict:   pass
  Reason:    The agent successfully extracted the main article heading and...

Usage
  Duration:  2,341ms
  Tokens:    1,245 (in: 892, out: 353)

Run ID: run_abc123def456
Quota remaining: 46 runs/day

View Scenario Details

tpm scenario info

View detailed information about a scenario including its run history and quality metrics.

Usage

# View scenario details
tpm scenario info clu123abc456

# Include run history
tpm scenario info clu123abc456 --runs 10

# JSON output
tpm scenario info clu123abc456 --json

Example Output

Scenario: Scrape main article content from a news website

ID:           clu123abc456
Collection:   Web Scraping Toolkit
Created:      2 weeks ago

Prompt:
  Scrape the main article content from a news website and extract
  the headline, author, publication date, and body text.

Quality Metrics:
  Score:              85%
  Total Runs:         12
  Consecutive Passes: 5
  Last Run:           2 hours ago (pass)

Tags: web-scraping, content-extraction, news

Recent Runs:
  #12  pass   2h ago    2,341ms   "Successfully extracted article content"
  #11  pass   1d ago    2,156ms   "Successfully extracted article content"
  #10  pass   2d ago    2,892ms   "Successfully extracted article content"
  #9   fail   3d ago    4,521ms   "Timeout waiting for page load"
  #8   pass   4d ago    2,234ms   "Successfully extracted article content"

Quality Scoring

How scenario quality is calculated

Quality scores help identify reliable scenarios and track improvement over time. Scores range from 0% to 100%.

Streak-Based Scoring

Scenarios earn bonus points for consecutive passes and lose points for consecutive failures:

On Pass
  • +5% base score
  • +1% per consecutive pass
  • Maximum score: 100%
On Failure
  • -10% base penalty
  • -2% per consecutive fail
  • Minimum score: 0%

Example

A scenario with 5 consecutive passes would have a quality score of approximately 85% (5% × 5 + 1% × (1+2+3+4+5) = 25% + 15% = 40%, plus base score). High-quality scenarios are featured on the TPMJS homepage showcase.

CI/CD Integration

Automate scenario testing in your pipeline

Integrate scenario testing into your CI/CD pipeline to catch regressions early.

GitHub Actions Example

name: TPMJS Scenario Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test-scenarios:
    runs-on: ubuntu-latest
    steps:
      - name: Install TPMJS CLI
        run: npm install -g @tpmjs/cli

      - name: Configure API Key
        run: |
          mkdir -p ~/.config/tpmjs
          echo '{"apiKey":"${{ secrets.TPMJS_API_KEY }}"}' > ~/.config/tpmjs/config.json

      - name: Run Scenarios
        run: tpm scenario run my-collection --json > results.json

      - name: Check Results
        run: |
          FAILED=$(jq '.failed' results.json)
          if [ "$FAILED" -gt 0 ]; then
            echo "❌ $FAILED scenario(s) failed"
            exit 1
          fi
          echo "✅ All scenarios passed"

JSON Output Format

{
  "collection": "my-collection",
  "total": 5,
  "passed": 4,
  "failed": 1,
  "errors": 0,
  "duration": 12345,
  "results": [
    {
      "scenarioId": "clu123abc456",
      "name": "Scrape main article content",
      "status": "pass",
      "duration": 2341,
      "verdict": "pass",
      "reason": "Successfully extracted article content"
    }
  ]
}

Rate Limits

Daily quota and usage tracking

Scenario execution is subject to daily rate limits to ensure fair usage:

  • Free tier: 50 scenario runs per day
  • Pro tier: 500 scenario runs per day
  • Quotas reset at midnight UTC
  • Failed runs count toward the quota

The remaining quota is shown after each scenario run. Plan your CI/CD schedules accordingly to stay within limits.

Next Steps

Continue learning about TPMJS