Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

FileTypeRouter

Use FileTypeRouter to categorize files or byte streams by their MIME type and route them to the appropriate converter or processing component in your pipeline.

Key Features

  • Routes files and byte streams to different branches based on MIME type
  • Supports exact MIME type matching and regex patterns (for example, audio/*, text/*)
  • Infers MIME types from file extensions for file paths, and from metadata for byte streams
  • Creates one named output per MIME type for direct downstream connections
  • Supports custom MIME type mappings for uncommon file formats

Configuration

  1. Drag the FileTypeRouter component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Set mime_types to the list of MIME type strings or regex patterns you want to route (for example, ["text/plain", "audio/x-wav", "image/jpeg"]).
  4. Go to the Advanced tab to configure additional_mimetypes and raise_on_failure.

Connections

FileTypeRouter accepts a list of file paths or ByteStream objects as input, along with optional metadata. It creates one output per MIME type listed in mime_types. Connect each output to the appropriate converter (for example, TextFileToDocument for text/plain, PDFMinerToDocument for application/pdf).

Usage Example

components:
FileTypeRouter:
type: components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- audio/x-wav
- image/jpeg
raise_on_failure: false

Parameters

Inputs

ParameterTypeDefaultDescription
sourcesList[Union[str, Path, ByteStream]]A list of file paths or byte streams to categorize.
metaOptional[Union[Dict[str, Any], List[Dict[str, Any]]]]NoneOptional metadata to attach to the sources. When provided, sources are converted to ByteStream objects and the metadata is added. A single dictionary applies to all sources; a list must match the number of sources.

Outputs

The component creates one output per MIME type listed in mime_types. Each output is a list of the matching sources.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
mime_typesList[str]A list of MIME types or regex patterns to classify the input files or byte streams (for example: ["text/plain", "audio/x-wav", "image/jpeg"]).
additional_mimetypesOptional[Dict[str, str]]NoneA dictionary of additional MIME type to file extension mappings. Use this to prevent unsupported or non-native packages from being left unclassified (for example: {"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}).
raise_on_failureboolFalseWhen True, a FileNotFoundError is always raised for non-existent files. When False (default), this exception is raised only when processing a non-existent file and the meta parameter is provided to run().

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
sourcesList[Union[str, Path, ByteStream]]A list of file paths or byte streams to categorize.
metaOptional[Union[Dict[str, Any], List[Dict[str, Any]]]]NoneOptional metadata to attach to the sources. When provided, sources are converted to ByteStream objects and the metadata is added. A single dictionary applies to all sources; a list must match the number of sources.