Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DocumentTypeRouter

Use DocumentTypeRouter to route documents to different pipeline branches based on their MIME type. It creates one output per MIME type you specify, plus an unclassified output for documents that don't match any type.

Key Features

  • Routes documents by MIME type using exact matches or regex patterns
  • Extracts MIME types from document metadata or infers them from file paths
  • Creates one named output per MIME type for easy downstream connections
  • Routes non-matching documents to a dedicated unclassified output
  • Supports custom MIME type mappings for uncommon file types

Configuration

  1. Drag the DocumentTypeRouter component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Set mime_types to the list of MIME type strings or regex patterns you want to route (for example, ["text/plain", "application/pdf", "image/jpeg"]).
  4. Go to the Advanced tab to configure mime_type_meta_field, file_path_meta_field, and additional_mimetypes.

Connections

DocumentTypeRouter accepts a list of documents as input. It creates one output per MIME type listed in mime_types, named after the MIME type (for example, text/plain), plus an unclassified output for documents that don't match. Connect each output to the appropriate converter or processing component.

The component requires at least one of these parameters to determine each document's MIME type: mime_type_meta_field (a metadata field holding the MIME type) or file_path_meta_field (a metadata field holding the file path, from which the MIME type is inferred).

Usage Example

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to categorize based on their MIME type.

Outputs

ParameterTypeDefaultDescription
unclassifiedList[Document]Documents that don't match any of the specified MIME types.
[mime_type]List[Document]Documents matching each specified MIME type (for example, text/plain, application/pdf).

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
mime_typesList[str]A list of MIME types or regex patterns to classify the input documents (for example: ["text/plain", "audio/x-wav", "image/jpeg"]).
mime_type_meta_fieldOptional[str]NoneThe name of the metadata field that holds the MIME type.
file_path_meta_fieldOptional[str]NoneThe name of the metadata field that holds the file path. Used to infer the MIME type when mime_type_meta_field is not provided or missing in a document.
additional_mimetypesOptional[Dict[str, str]]NoneA dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful for uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to categorize based on their MIME type.