Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DocumentLengthRouter

Categorize documents based on the length of their content field and route them to the appropriate output.

Key Features

  • Routes documents to short_documents or long_documents outputs based on content length.
  • Configurable character threshold to define what counts as "short".
  • Routes documents with None content to short_documents automatically.
  • Set the threshold to a negative number to route only None-content documents to short_documents.

Configuration

  1. Drag the DocumentLengthRouter component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set the Threshold to define the maximum number of characters for a document to be considered short. Documents with content length less than or equal to this value (or None content) go to short_documents. All others go to long_documents. The default is 10.

Connections

DocumentLengthRouter receives a List[Document] from upstream components such as DocumentSplitter. It outputs two streams:

  • short_documents: Documents where content is None or shorter than or equal to the threshold. Connect this to components that handle short or empty content, such as LLMDocumentContentExtractor or SentenceTransformersDocumentImageEmbedder.
  • long_documents: Documents with content longer than the threshold. Connect this to standard processing components.

Source Code

To check this component's source code, open document_length_router.py in the Haystack repository.

Usage Examples

Basic Configuration

  DocumentLengthRouter:
type: haystack.components.routers.document_length_router.DocumentLengthRouter
init_parameters:
threshold: 10

Parameters

Inputs

ParameterTypeDescription
documentsList[Document]A list of documents to categorize based on their content length.

Outputs

ParameterTypeDescription
short_documentsList[Document]Documents where content is None or whose character count is less than or equal to the threshold.
long_documentsList[Document]Documents where the character count of content is greater than the threshold.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
thresholdint10The maximum number of characters for a document to be considered short. Documents where content is None or whose character count is less than or equal to this value are routed to short_documents. All others go to long_documents. Set to a negative number to route only None-content documents to short_documents.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to categorize based on their content length.