DocumentSplitter Parameters

Customize DocumentSplitter to shorten your text documents.

YAML Init Parameters

These are the parameters you can pass to this component in the pipeline YAML configuration:

ParameterTypePossible valuesDescription
split_byLiteralword
sentence
page
passage
Default: word
The unit by which the document should be split. Choose from word (splitting by " "), sentence (splitting by "."), page (splitting by "\f"), or passage (splitting by "\n\n").
Required.
split_lengthIntegerDefault: 200The maximum number of units in each split. For example, if you set split_by: word and split_lenght: 20, each document will be no longer than 20 words.
Required.
split_overlapIntegerDefault: 0The number of units that each split should overlap. For example, if you set split_overlap: 3 and split_by: word, each document will share three words with the previous document.
Required.
split_thresholdIntegerDefault: 0The minimum number of units that the split should have. If the split has fewer units than the threshold, it's attached to the previous split.
Required.

REST API Runtime Parameters

There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.