Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DocumentTypeRouter

Route documents to different pipeline branches based on their MIME types.

Key Features

  • Routes documents by MIME type to separate output connections.
  • Supports exact MIME type matching and regex patterns.
  • Infers MIME types from document metadata or file path metadata.
  • Routes unclassified documents (no matching MIME type) to a dedicated unclassified output.
  • Supports custom MIME type mappings for uncommon or proprietary file types.

Configuration

  1. Drag the DocumentTypeRouter component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set MIME Types to the list of MIME types or regex patterns to classify documents (for example, ["text/plain", "application/pdf"]).
    • Set MIME Type Meta Field to the name of the document metadata field that contains the MIME type, if available.
    • Set File Path Meta Field to the name of the document metadata field that contains the file path, if you want the component to infer the MIME type from the file extension.
  4. Go to the Advanced tab to set Additional MIME Types, a dictionary mapping MIME types to file extensions for uncommon or custom file types (for example, {"application/vnd.custom-type": ".custom"}).

Connections

DocumentTypeRouter receives a List[Document] from any upstream component. For each MIME type you specify, it creates a named output connection (for example, text/plain or application/pdf). Documents that don't match any MIME type go to the unclassified output. Connect each named output to the appropriate downstream component, such as a specific converter for that file type.

Source Code

To check this component's source code, open document_type_router.py in the Haystack repository.

Usage Examples

Basic Configuration

  DocumentTypeRouter:
type: haystack.components.routers.document_type_router.DocumentTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf

Parameters

Inputs

ParameterTypeDescription
documentsList[Document]A list of documents to categorize based on their MIME type.

Outputs

ParameterTypeDescription
unclassifiedList[Document]Documents that don't match any of the specified MIME types.
[mime_type]List[Document]Documents matching each specified MIME type (for example, text/plain or application/pdf).

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
mime_typesList[str]A list of MIME types or regex patterns to classify the input documents (for example, ["text/plain", "audio/x-wav", "image/jpeg"]).
mime_type_meta_fieldOptional[str]NoneThe name of the metadata field that holds the MIME type.
file_path_meta_fieldOptional[str]NoneThe name of the metadata field that holds the file path. Used to infer the MIME type if mime_type_meta_field is not provided or missing in a document.
additional_mimetypesOptional[Dict[str, str]]NoneA dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful for uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to categorize based on their MIME type.