Home/Tools/@tpmjs/tools-text-chunk

textChunkTool

@tpmjs/tools-text-chunk

Split text into chunks by size with sentence boundary awareness. Useful for processing large documents, preparing text for embeddings, or breaking content into manageable pieces. Uses intelligent sentence detection to avoid breaking mid-sentence when possible.

Official
data
v0.2.0
MIT

Interactive Playground

Test @tpmjs/tools-text-chunk (textChunkTool) with AI-powered execution

0/2000 characters

Installation & Usage

Install this tool and use it with the AI SDK

1. Install the package

npm install @tpmjs/tools-text-chunk
pnpm add @tpmjs/tools-text-chunk
yarn add @tpmjs/tools-text-chunk
bun add @tpmjs/tools-text-chunk
deno add npm:@tpmjs/tools-text-chunk

2. Import the tool

import { textChunkTool } from '@tpmjs/tools-text-chunk';

3. Use with AI SDK

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { textChunkTool } from '@tpmjs/tools-text-chunk';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: { textChunkTool },
  prompt: 'Your prompt here...',
});

console.log(result.text);

Parameters

Available configuration options

Auto-extracted
text
Required
Type: string

The text to split into chunks

maxChunkSize
Required
Type: number

Maximum size of each chunk in characters (must be > 0)

overlap
Optional
Type: number

Number of characters to overlap between chunks (default: 0, must be < maxChunkSize)

Schema extracted: 1/1/2026, 8:18:37 AM

README

@tpmjs/tools-text-chunk

Split text into chunks by size or sentence boundaries with optional overlap.

Installation

npm install @tpmjs/tools-text-chunk

Usage

import { textChunkTool } from '@tpmjs/tools-text-chunk';

// Use with AI SDK
const result = await textChunkTool.execute({
  text: 'Your long text here...',
  maxChunkSize: 500,
  overlap: 50,
});

console.log(result.chunks);
// [
//   { text: '...', startIndex: 0, endIndex: 500, chunkIndex: 0 },
//   { text: '...', startIndex: 450, endIndex: 950, chunkIndex: 1 },
//   ...
// ]

Features

  • Sentence-aware chunking: Uses the sbd library for intelligent sentence detection
  • Configurable overlap: Control how much text overlaps between chunks
  • Boundary preservation: Tries to avoid breaking mid-sentence when possible
  • Detailed metadata: Returns chunk indices, positions, and statistics

Parameters

  • text (string, required): The text to split into chunks
  • maxChunkSize (number, required): Maximum size of each chunk in characters
  • overlap (number, optional): Number of characters to overlap between chunks (default: 0)

Returns

{
  chunks: Array<{
    text: string;
    startIndex: number;
    endIndex: number;
    chunkIndex: number;
  }>;
  chunkCount: number;
  totalLength: number;
  metadata: {
    averageChunkSize: number;
    maxChunkSize: number;
    overlap: number;
  };
}

Use Cases

  • Preparing text for embeddings (RAG systems)
  • Processing large documents in smaller pieces
  • Creating sliding window text analysis
  • Breaking content for API rate limits
  • Generating text previews

Example with AI Agent

import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
import { textChunkTool } from '@tpmjs/tools-text-chunk';

const result = await generateText({
  model: openai('gpt-4'),
  tools: {
    textChunk: textChunkTool,
  },
  prompt: 'Split this document into 500-character chunks with 50 character overlap: ...',
});

License

MIT

Statistics

Downloads/month

0

Quality Score

0%

Bundle Size

NPM Keywords

tpmjs
data
text
chunking
nlp

Maintainers

thomasdavis(thomasalwyndavis@gmail.com)

Frameworks

vercel-ai