Home/Tools/@tpmjs/tools-data-classification-heuristic

dataClassificationHeuristic

@tpmjs/tools-data-classification-heuristic

Classifies data sensitivity using heuristics to detect PII (personal identifiable information), financial data, health data, and other sensitive patterns. Returns classification level (public/internal/confidential/restricted), detected signals, and confidence score.

Official
compliance
v0.2.0
MIT

Interactive Playground

Test @tpmjs/tools-data-classification-heuristic (dataClassificationHeuristic) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-data-classification-heuristic
pnpm add @tpmjs/tools-data-classification-heuristic
yarn add @tpmjs/tools-data-classification-heuristic
bun add @tpmjs/tools-data-classification-heuristic
deno add npm:@tpmjs/tools-data-classification-heuristic

2. Import the tool

import { dataClassificationHeuristic } from '@tpmjs/tools-data-classification-heuristic';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { dataClassificationHeuristic } from '@tpmjs/tools-data-classification-heuristic';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { dataClassificationHeuristic },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Parameters

Available configuration options

Auto-extracted
text
Required
Type: string

The text content to analyze for sensitive data patterns

Schema extracted: 1/1/2026, 8:17:38 AM

README

Data Classification Heuristic Tool

Classifies data sensitivity using pattern-based heuristics to detect PII, financial data, health data, and other sensitive information.

Installation

npm install @tpmjs/tools-data-classification-heuristic

Usage

import { dataClassificationHeuristic } from '@tpmjs/tools-data-classification-heuristic';

const result = await dataClassificationHeuristic.execute({
  text: "Contact John Doe at john.doe@example.com or call 555-123-4567. SSN: 123-45-6789"
});

console.log(result);
// {
//   classification: 'restricted',
//   signals: [
//     { type: 'Email', severity: 'medium', description: 'Email address detected', matches: 1 },
//     { type: 'Phone', severity: 'medium', description: 'Phone number detected', matches: 1 },
//     { type: 'SSN', severity: 'critical', description: 'Social Security Number detected', matches: 1 }
//   ],
//   confidence: 0.8,
//   summary: {
//     totalSignals: 3,
//     highestSeverity: 'critical',
//     categories: ['PII']
//   }
// }

Classification Levels

  • public - No sensitive data detected, safe for public distribution
  • internal - Low-medium sensitivity data, internal use only
  • confidential - High sensitivity data, restricted distribution
  • restricted - Critical data (SSN, credentials, financial), highly restricted

Detected Patterns

PII (Personal Identifiable Information)

  • Social Security Numbers (SSN)
  • Email addresses
  • Phone numbers
  • Dates of birth
  • Physical addresses

Financial Data

  • Credit card numbers
  • Bank account numbers
  • Routing numbers
  • Salary information

Health Data (HIPAA)

  • Medical record numbers
  • Diagnoses
  • Prescriptions

Government IDs

  • Passport numbers
  • Driver license numbers

Authentication

  • API keys
  • Passwords
  • Access tokens

Technical

  • IP addresses

Output Schema

interface DataClassification {
  classification: 'public' | 'internal' | 'confidential' | 'restricted';
  signals: Array<{
    type: string;
    pattern: string;
    severity: 'low' | 'medium' | 'high' | 'critical';
    description: string;
    matches?: number;
  }>;
  confidence: number; // 0-1 scale
  summary: {
    totalSignals: number;
    highestSeverity: string;
    categories: string[];
  };
}

Use Cases

  • Data Loss Prevention (DLP) - Scan documents before sharing
  • Compliance Auditing - Identify sensitive data in databases
  • Email Filtering - Classify email content sensitivity
  • Document Review - Automatically classify documents for access control
  • Privacy Impact Assessment - Detect PII in data processing activities

Limitations

  • Heuristic-based detection (pattern matching only)
  • May produce false positives (e.g., random number sequences)
  • Does not understand context or semantic meaning
  • Should be used as a first-pass filter, not definitive classification
  • Cannot detect all types of sensitive data (e.g., trade secrets require domain knowledge)

License

MIT

Statistics

Downloads/month

38

Quality Score

78%

Bundle Size

NPM Keywords

tpmjs
compliance
ai
privacy
pii
data-classification

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai