Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DocumentLengthRouter

Use DocumentLengthRouter to categorize documents based on the length of their content field and route them to different pipeline branches for separate processing.

Key Features

  • Routes documents to short_documents or long_documents outputs based on character count
  • Treats documents with None content as short documents
  • Configurable character threshold for the short/long boundary
  • Useful for handling PDFs with scanned pages or image-only content alongside text documents

Configuration

  1. Drag the DocumentLengthRouter component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. Configure the parameters as needed.

Connections

DocumentLengthRouter accepts a list of documents — typically from a DocumentSplitter or a converter. It sends documents whose content is None or whose character count is at or below the threshold to short_documents, and the rest to long_documents. Connect short_documents to components like LLMDocumentContentExtractor or image embedders, and long_documents to standard text processing components.

Usage Example

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to categorize based on their content length.

Outputs

ParameterTypeDefaultDescription
short_documentsList[Document]Documents where content is None or whose character count is less than or equal to the threshold.
long_documentsList[Document]Documents where the character count of content is greater than the threshold.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
thresholdint10The character count threshold for the document content field. Documents where content is None or whose character count is less than or equal to this value are routed to short_documents. All others go to long_documents. To route only documents with None content to short_documents, set the threshold to a negative number.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to categorize based on their content length.