Customize DocumentSplitter to shorten your text documents.
YAML Init Parameters
These are the parameters you can pass to this component in the pipeline YAML configuration:
Parameter | Type | Possible values | Description |
---|---|---|---|
split_by | Literal | word sentence page passage Default: word | The unit by which the document should be split. Choose from word (splitting by " "), sentence (splitting by "."), page (splitting by "\f"), or passage (splitting by "\n\n").Required. |
split_length | Integer | Default: 200 | The maximum number of units in each split. For example, if you set split_by: word and split_lenght: 20 , each document will be no longer than 20 words.Required. |
split_overlap | Integer | Default: 0 | The number of units that each split should overlap. For example, if you set split_overlap: 3 and split_by: word , each document will share three words with the previous document.Required. |
split_threshold | Integer | Default: 0 | The minimum number of units that the split should have. If the split has fewer units than the threshold, it's attached to the previous split. Required. |
REST API Runtime Parameters
There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.