DocumentLengthRouter
Categorize documents based on the length of their content field and route them to the appropriate output.
Key Features
- Routes documents to
short_documentsorlong_documentsoutputs based on content length. - Configurable character threshold to define what counts as "short".
- Routes documents with
Nonecontent toshort_documentsautomatically. - Set the threshold to a negative number to route only
None-content documents toshort_documents.
Configuration
- Drag the
DocumentLengthRoutercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set the Threshold to define the maximum number of characters for a document to be considered short. Documents with content length less than or equal to this value (or
Nonecontent) go toshort_documents. All others go tolong_documents. The default is10.
- Set the Threshold to define the maximum number of characters for a document to be considered short. Documents with content length less than or equal to this value (or
Connections
DocumentLengthRouter receives a List[Document] from upstream components such as DocumentSplitter. It outputs two streams:
short_documents: Documents wherecontentisNoneor shorter than or equal to the threshold. Connect this to components that handle short or empty content, such asLLMDocumentContentExtractororSentenceTransformersDocumentImageEmbedder.long_documents: Documents with content longer than the threshold. Connect this to standard processing components.
Source Code
To check this component's source code, open document_length_router.py in the Haystack repository.
Usage Examples
Basic Configuration
DocumentLengthRouter:
type: haystack.components.routers.document_length_router.DocumentLengthRouter
init_parameters:
threshold: 10
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | A list of documents to categorize based on their content length. |
Outputs
| Parameter | Type | Description |
|---|---|---|
short_documents | List[Document] | Documents where content is None or whose character count is less than or equal to the threshold. |
long_documents | List[Document] | Documents where the character count of content is greater than the threshold. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | int | 10 | The maximum number of characters for a document to be considered short. Documents where content is None or whose character count is less than or equal to this value are routed to short_documents. All others go to long_documents. Set to a negative number to route only None-content documents to short_documents. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to categorize based on their content length. |
Was this page helpful?