Skip to main content

DocumentTypeRouter

Routes documents by their MIME types.

Basic Information

  • Type: haystack.components.routers.DocumentTypeRouter
  • Components it can connect with:
    • Converters: DocumentTypeRouter can send documents to specific converters based on their MIME type.

Inputs

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to be categorized based on their MIME type.

Outputs

ParameterTypeDefaultDescription
unclassifiedList[Document]Documents that don't match any of the specified MIME types.
[mime_type]List[Document]Documents matching each specified MIME type (e.g., "text/plain", "application/pdf", etc.).

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. It supports exact MIME type matches and regex patterns.

You can extract MIME types directly from document metadata or infer them from file paths using standard or user-supplied MIME type mappings. The component categorizes input documents into groups based on their MIME type and routes them to the appropriate output. DocumentTypeRouter requires at least one of these parameters to determine the MIME type:

  • mime_type_meta_field: Name of the metadata field containing the MIME type.
  • file_path_meta_field: Name of the metadata field containing the file path (MIME type is inferred from the file extension).

Documents that don't match any of the specified MIME types are routed to the unclassified output. Documents that match a specific MIME type are routed to an output named after that MIME type.

Usage Example

Initializing the Component

components:
DocumentTypeRouter:
type: haystack.components.routers.document_type_router.DocumentTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- image/jpeg
- image/png
mime_type_meta_field: mime_type
file_path_meta_field: file_path
additional_mimetypes:
application/vnd.custom-type: .custom

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
mime_typesList[str]A list of MIME types or regex patterns to classify the input documents (for example: ["text/plain", "audio/x-wav", "image/jpeg"]).
mime_type_meta_fieldOptional[str]NoneOptional name of the metadata field that holds the MIME type.
file_path_meta_fieldOptional[str]NoneOptional name of the metadata field that holds the file path. Used to infer the MIME type if mime_type_meta_field is not provided or missing in a document.
additional_mimetypesOptional[Dict[str, str]]NoneOptional dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful when working with uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to be categorized based on their MIME type.