FileTypeRouter
Use FileTypeRouter to categorize files or byte streams by their MIME type and route them to the appropriate converter or processing component in your pipeline.
Key Features
- Routes files and byte streams to different branches based on MIME type
- Supports exact MIME type matching and regex patterns (for example,
audio/*,text/*) - Infers MIME types from file extensions for file paths, and from metadata for byte streams
- Creates one named output per MIME type for direct downstream connections
- Supports custom MIME type mappings for uncommon file formats
Configuration
- Drag the
FileTypeRoutercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Set
mime_typesto the list of MIME type strings or regex patterns you want to route (for example,["text/plain", "audio/x-wav", "image/jpeg"]).
- Set
- Go to the Advanced tab to configure
additional_mimetypesandraise_on_failure.
Connections
FileTypeRouter accepts a list of file paths or ByteStream objects as input, along with optional metadata. It creates one output per MIME type listed in mime_types. Connect each output to the appropriate converter (for example, TextFileToDocument for text/plain, PDFMinerToDocument for application/pdf).
Usage Example
components:
FileTypeRouter:
type: components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- audio/x-wav
- image/jpeg
raise_on_failure: false
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | A list of file paths or byte streams to categorize. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the sources. When provided, sources are converted to ByteStream objects and the metadata is added. A single dictionary applies to all sources; a list must match the number of sources. |
Outputs
The component creates one output per MIME type listed in mime_types. Each output is a list of the matching sources.
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
mime_types | List[str] | A list of MIME types or regex patterns to classify the input files or byte streams (for example: ["text/plain", "audio/x-wav", "image/jpeg"]). | |
additional_mimetypes | Optional[Dict[str, str]] | None | A dictionary of additional MIME type to file extension mappings. Use this to prevent unsupported or non-native packages from being left unclassified (for example: {"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}). |
raise_on_failure | bool | False | When True, a FileNotFoundError is always raised for non-existent files. When False (default), this exception is raised only when processing a non-existent file and the meta parameter is provided to run(). |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | A list of file paths or byte streams to categorize. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the sources. When provided, sources are converted to ByteStream objects and the metadata is added. A single dictionary applies to all sources; a list must match the number of sources. |
Was this page helpful?