DocumentTypeRouter
Use DocumentTypeRouter to route documents to different pipeline branches based on their MIME type. It creates one output per MIME type you specify, plus an unclassified output for documents that don't match any type.
Key Features
- Routes documents by MIME type using exact matches or regex patterns
- Extracts MIME types from document metadata or infers them from file paths
- Creates one named output per MIME type for easy downstream connections
- Routes non-matching documents to a dedicated
unclassifiedoutput - Supports custom MIME type mappings for uncommon file types
Configuration
- Drag the
DocumentTypeRoutercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Set
mime_typesto the list of MIME type strings or regex patterns you want to route (for example,["text/plain", "application/pdf", "image/jpeg"]).
- Set
- Go to the Advanced tab to configure
mime_type_meta_field,file_path_meta_field, andadditional_mimetypes.
Connections
DocumentTypeRouter accepts a list of documents as input. It creates one output per MIME type listed in mime_types, named after the MIME type (for example, text/plain), plus an unclassified output for documents that don't match. Connect each output to the appropriate converter or processing component.
The component requires at least one of these parameters to determine each document's MIME type: mime_type_meta_field (a metadata field holding the MIME type) or file_path_meta_field (a metadata field holding the file path, from which the MIME type is inferred).
Usage Example
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to categorize based on their MIME type. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
unclassified | List[Document] | Documents that don't match any of the specified MIME types. | |
[mime_type] | List[Document] | Documents matching each specified MIME type (for example, text/plain, application/pdf). |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
mime_types | List[str] | A list of MIME types or regex patterns to classify the input documents (for example: ["text/plain", "audio/x-wav", "image/jpeg"]). | |
mime_type_meta_field | Optional[str] | None | The name of the metadata field that holds the MIME type. |
file_path_meta_field | Optional[str] | None | The name of the metadata field that holds the file path. Used to infer the MIME type when mime_type_meta_field is not provided or missing in a document. |
additional_mimetypes | Optional[Dict[str, str]] | None | A dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful for uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to categorize based on their MIME type. |
Was this page helpful?