Skip to main content

Class: SentenceSplitter

SentenceSplitter is our default text splitter that supports splitting into sentences, paragraphs, or fixed length chunks with overlap.

One of the advantages of SentenceSplitter is that even in the fixed length chunks it will try to keep sentences together.

Constructors

constructor

new SentenceSplitter(options?)

Parameters

NameType
options?Object
options.chunkOverlap?number
options.chunkSize?number
options.chunkingTokenizerFn?(text: string) => null | RegExpMatchArray
options.paragraphSeparator?string
options.splitLongSentences?boolean
options.tokenizer?any
options.tokenizerDecoder?any

Defined in

TextSplitter.ts:67

Properties

chunkOverlap

Private chunkOverlap: number

Defined in

TextSplitter.ts:60


chunkSize

Private chunkSize: number

Defined in

TextSplitter.ts:59


chunkingTokenizerFn

Private chunkingTokenizerFn: (text: string) => null | RegExpMatchArray

Type declaration

▸ (text): null | RegExpMatchArray

Parameters
NameType
textstring
Returns

null | RegExpMatchArray

Defined in

TextSplitter.ts:64


paragraphSeparator

Private paragraphSeparator: string

Defined in

TextSplitter.ts:63


splitLongSentences

Private splitLongSentences: boolean

Defined in

TextSplitter.ts:65


tokenizer

Private tokenizer: any

Defined in

TextSplitter.ts:61


tokenizerDecoder

Private tokenizerDecoder: any

Defined in

TextSplitter.ts:62

Methods

combineTextSplits

combineTextSplits(newSentenceSplits, effectiveChunkSize): TextSplit[]

Parameters

NameType
newSentenceSplitsSplitRep[]
effectiveChunkSizenumber

Returns

TextSplit[]

Defined in

TextSplitter.ts:205


getEffectiveChunkSize

Private getEffectiveChunkSize(extraInfoStr?): number

Parameters

NameType
extraInfoStr?string

Returns

number

Defined in

TextSplitter.ts:104


getParagraphSplits

getParagraphSplits(text, effectiveChunkSize?): string[]

Parameters

NameType
textstring
effectiveChunkSize?number

Returns

string[]

Defined in

TextSplitter.ts:121


getSentenceSplits

getSentenceSplits(text, effectiveChunkSize?): string[]

Parameters

NameType
textstring
effectiveChunkSize?number

Returns

string[]

Defined in

TextSplitter.ts:147


processSentenceSplits

Private processSentenceSplits(sentenceSplits, effectiveChunkSize): SplitRep[]

Splits sentences into chunks if necessary.

This isn't great behavior because it can split down the middle of a word or in non-English split down the middle of a Unicode codepoint so the splitting is turned off by default. If you need it, please set the splitLongSentences option to true.

Parameters

NameType
sentenceSplitsstring[]
effectiveChunkSizenumber

Returns

SplitRep[]

Defined in

TextSplitter.ts:176


splitText

splitText(text, extraInfoStr?): string[]

Parameters

NameType
textstring
extraInfoStr?string

Returns

string[]

Defined in

TextSplitter.ts:297


splitTextWithOverlaps

splitTextWithOverlaps(text, extraInfoStr?): TextSplit[]

Parameters

NameType
textstring
extraInfoStr?string

Returns

TextSplit[]

Defined in

TextSplitter.ts:269