DocumentLengthRouter
Use DocumentLengthRouter to categorize documents based on the length of their content field and route them to different pipeline branches for separate processing.
Key Features
- Routes documents to
short_documentsorlong_documentsoutputs based on character count - Treats documents with
Nonecontent as short documents - Configurable character threshold for the short/long boundary
- Useful for handling PDFs with scanned pages or image-only content alongside text documents
Configuration
- Drag the
DocumentLengthRoutercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- Configure the parameters as needed.
Connections
DocumentLengthRouter accepts a list of documents — typically from a DocumentSplitter or a converter. It sends documents whose content is None or whose character count is at or below the threshold to short_documents, and the rest to long_documents. Connect short_documents to components like LLMDocumentContentExtractor or image embedders, and long_documents to standard text processing components.
Usage Example
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to categorize based on their content length. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
short_documents | List[Document] | Documents where content is None or whose character count is less than or equal to the threshold. | |
long_documents | List[Document] | Documents where the character count of content is greater than the threshold. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | int | 10 | The character count threshold for the document content field. Documents where content is None or whose character count is less than or equal to this value are routed to short_documents. All others go to long_documents. To route only documents with None content to short_documents, set the threshold to a negative number. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to categorize based on their content length. |
Was this page helpful?