FileTypeRouter
Categorizes files or byte streams by their MIME types, helping in context-based routing.
Basic Information
- Type:
haystack_integrations.routers.file_type_router.FileTypeRouter
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | List[Union[str, Path, ByteStream]] | A list of file paths or byte streams to categorize. | |
| meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the sources. When provided, the sources are internally converted to ByteStream objects and the metadata is added. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all ByteStream objects. If it's a list, its length must match the number of sources, as they are zipped together. |
Outputs
| Parameter | Type | Default | Description |
|---|
Overview
Bear with us while we're working on adding pipeline examples and most common components connections.
Categorizes files or byte streams by their MIME types, helping in context-based routing.
FileTypeRouter supports both exact MIME type matching and regex patterns.
For file paths, MIME types come from extensions, while byte streams use metadata.
You can use regex patterns in the mime_types parameter to set broad categories
(such as 'audio/' or 'text/') or specific types.
MIME types without regex patterns are treated as exact matches.
When raise_on_failure is set to True, FileNotFoundError is always raised for non-existent files, regardless of whether the meta parameter is provided.
Usage Example
components:
FileTypeRouter:
type: components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- audio/x-wav
- image/jpeg
raise_on_failure: false
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| mime_types | List[str] | A list of MIME types or regex patterns to classify the input files or byte streams. (for example: ["text/plain", "audio/x-wav", "image/jpeg"]). | |
| additional_mimetypes | Optional[Dict[str, str]] | None | A dictionary containing the MIME type to add to the mimetypes package to prevent unsupported or non native packages from being unclassified. (for example: {"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}). |
| raise_on_failure | bool | False | When set to True, FileNotFoundError is always raised for non-existent files. When False (default), this exception is raised only when processing a non-existent file and the meta parameter is provided to run(). |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | List[Union[str, Path, ByteStream]] | A list of file paths or byte streams to categorize. | |
| meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the sources. When provided, the sources are internally converted to ByteStream objects and the metadata is added. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all ByteStream objects. If it's a list, its length must match the number of sources, as they are zipped together. |
Was this page helpful?