FileTypeRouter
Categorize files or byte streams by their MIME types and route them to different pipeline branches.
Key Features
- Routes file paths and
ByteStreamobjects to outputs based on MIME type. - Supports exact MIME type matching and regex patterns (for example,
audio/*ortext/*). - Infers MIME types from file extensions for file paths and from metadata for byte streams.
- Supports custom MIME type mappings for unsupported or proprietary file types.
- Optionally raises an error for non-existent files.
Configuration
- Drag the
FileTypeRoutercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set MIME Types to the list of MIME types or regex patterns to classify the input files or byte streams (for example,
["text/plain", "audio/x-wav", "image/jpeg"]).
- Set MIME Types to the list of MIME types or regex patterns to classify the input files or byte streams (for example,
- Go to the Advanced tab to configure optional settings:
- Set Additional MIME Types to add custom MIME type-to-extension mappings for file types not supported by the standard
mimetypesmodule (for example,{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}). - Enable Raise on Failure to always raise a
FileNotFoundErrorfor non-existent files. When disabled (default), this error is only raised when themetaparameter is provided at runtime.
- Set Additional MIME Types to add custom MIME type-to-extension mappings for file types not supported by the standard
Connections
FileTypeRouter receives a list of file paths or ByteStream objects. For each MIME type you specify, it creates a named output connection. Connect each output to the appropriate file converter (for example, connect the text/plain output to TextFileToDocument and the application/pdf output to a PDF converter).
Source Code
To check this component's source code, open file_type_router.py in the Haystack repository.
Usage Examples
Basic Configuration
FileTypeRouter:
type: components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- audio/x-wav
- image/jpeg
raise_on_failure: false
components:
FileTypeRouter:
type: components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- audio/x-wav
- image/jpeg
raise_on_failure: false
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
sources | List[Union[str, Path, ByteStream]] | A list of file paths or byte streams to categorize. |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | Optional metadata to attach to the sources. A single dictionary is applied to all sources; a list must match the number of sources. |
Outputs
| Parameter | Type | Description |
|---|---|---|
| (per MIME type) | List[ByteStream] | Files or streams matching each specified MIME type, routed to the corresponding named output. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
mime_types | List[str] | A list of MIME types or regex patterns to classify the input files or byte streams (for example, ["text/plain", "audio/x-wav", "image/jpeg"]). | |
additional_mimetypes | Optional[Dict[str, str]] | None | A dictionary of MIME type-to-extension mappings to add to the mimetypes module. Useful for unsupported or non-native file types (for example, {"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}). |
raise_on_failure | bool | False | When True, FileNotFoundError is always raised for non-existent files. When False, this error is only raised when the meta parameter is provided to run(). |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | A list of file paths or byte streams to categorize. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the sources. A single dictionary is applied to all sources; a list must match the number of sources. |
Was this page helpful?