DocumentTypeRouter
Routes documents by their MIME types.
Basic Information
- Type:
haystack.components.routers.DocumentTypeRouter - Components it can connect with:
Converters:DocumentTypeRoutercan send documents to specific converters based on their MIME type.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents to be categorized based on their MIME type. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| unclassified | List[Document] | Documents that don't match any of the specified MIME types. | |
| [mime_type] | List[Document] | Documents matching each specified MIME type (e.g., "text/plain", "application/pdf", etc.). |
Overview
Bear with us while we're working on adding pipeline examples and most common components connections.
DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. It supports exact MIME type matches and regex patterns.
You can extract MIME types directly from document metadata or infer them from file paths using standard or user-supplied MIME type mappings. The component categorizes input documents into groups based on their MIME type and routes them to the appropriate output. DocumentTypeRouter requires at least one of these parameters to determine the MIME type:
mime_type_meta_field: Name of the metadata field containing the MIME type.file_path_meta_field: Name of the metadata field containing the file path (MIME type is inferred from the file extension).
Documents that don't match any of the specified MIME types are routed to the unclassified output. Documents that match a specific MIME type are routed to an output named after that MIME type.
Usage Example
Initializing the Component
components:
DocumentTypeRouter:
type: haystack.components.routers.document_type_router.DocumentTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- image/jpeg
- image/png
mime_type_meta_field: mime_type
file_path_meta_field: file_path
additional_mimetypes:
application/vnd.custom-type: .custom
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| mime_types | List[str] | A list of MIME types or regex patterns to classify the input documents (for example: ["text/plain", "audio/x-wav", "image/jpeg"]). | |
| mime_type_meta_field | Optional[str] | None | Optional name of the metadata field that holds the MIME type. |
| file_path_meta_field | Optional[str] | None | Optional name of the metadata field that holds the file path. Used to infer the MIME type if mime_type_meta_field is not provided or missing in a document. |
| additional_mimetypes | Optional[Dict[str, str]] | None | Optional dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful when working with uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents to be categorized based on their MIME type. |
Was this page helpful?