DocumentTypeRouter
Route documents to different pipeline branches based on their MIME types.
Key Features
- Routes documents by MIME type to separate output connections.
- Supports exact MIME type matching and regex patterns.
- Infers MIME types from document metadata or file path metadata.
- Routes unclassified documents (no matching MIME type) to a dedicated
unclassifiedoutput. - Supports custom MIME type mappings for uncommon or proprietary file types.
Configuration
- Drag the
DocumentTypeRoutercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set MIME Types to the list of MIME types or regex patterns to classify documents (for example,
["text/plain", "application/pdf"]). - Set MIME Type Meta Field to the name of the document metadata field that contains the MIME type, if available.
- Set File Path Meta Field to the name of the document metadata field that contains the file path, if you want the component to infer the MIME type from the file extension.
- Set MIME Types to the list of MIME types or regex patterns to classify documents (for example,
- Go to the Advanced tab to set Additional MIME Types, a dictionary mapping MIME types to file extensions for uncommon or custom file types (for example,
{"application/vnd.custom-type": ".custom"}).
Connections
DocumentTypeRouter receives a List[Document] from any upstream component. For each MIME type you specify, it creates a named output connection (for example, text/plain or application/pdf). Documents that don't match any MIME type go to the unclassified output. Connect each named output to the appropriate downstream component, such as a specific converter for that file type.
Source Code
To check this component's source code, open document_type_router.py in the Haystack repository.
Usage Examples
Basic Configuration
DocumentTypeRouter:
type: haystack.components.routers.document_type_router.DocumentTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | A list of documents to categorize based on their MIME type. |
Outputs
| Parameter | Type | Description |
|---|---|---|
unclassified | List[Document] | Documents that don't match any of the specified MIME types. |
[mime_type] | List[Document] | Documents matching each specified MIME type (for example, text/plain or application/pdf). |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
mime_types | List[str] | A list of MIME types or regex patterns to classify the input documents (for example, ["text/plain", "audio/x-wav", "image/jpeg"]). | |
mime_type_meta_field | Optional[str] | None | The name of the metadata field that holds the MIME type. |
file_path_meta_field | Optional[str] | None | The name of the metadata field that holds the file path. Used to infer the MIME type if mime_type_meta_field is not provided or missing in a document. |
additional_mimetypes | Optional[Dict[str, str]] | None | A dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful for uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to categorize based on their MIME type. |
Was this page helpful?