FileToFileContent
Converts local files into FileContent objects that can be embedded into ChatMessage objects and passed to an LLM.
FileContent objects contain the base64-encoded file data, MIME type, and filename. The component automatically detects the MIME type of each file. Empty files are skipped with a warning.
Key Features
- Converts files of any supported format to
FileContentobjects for direct LLM input. - Automatically detects MIME types.
- Supports optional extra metadata for provider-specific information.
- Skips empty files gracefully with a warning.
Configuration
- Drag the
FileToFileContentcomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
This component has no init parameters to configure in Pipeline Builder.
Connections
FileToFileContent accepts a list of file paths or ByteStream objects through its sources input. It outputs a list of FileContent objects.
It typically connects with:
FileTypeRouter: receives files routed by MIME type.ChatPromptBuilder: sendsFileContentobjects for inclusion in chat messages to an LLM.
Source Code
To check this component's source code, open file_to_file_content.py in the Haystack repository.
Usage Examples
Basic Configuration
FileToFileContent:
type: haystack.components.converters.file_to_file_content.FileToFileContent
init_parameters: {}
Using the Component in a Pipeline
In this pipeline, FileToFileContent converts files and passes them to a chat generator through a ChatPromptBuilder:
# haystack-pipeline
components:
FileToFileContent:
type: haystack.components.converters.file_to_file_content.FileToFileContent
init_parameters: {}
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template:
- _content:
- text: "Analyze the following files and answer questions about them."
_role: system
- _content:
- text: "{{ query }}"
_role: user
required_variables:
variables:
ChatGenerator:
type: haystack.components.generators.chat.openai.OpenAIChatGenerator
init_parameters:
model: gpt-4o
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
connections:
- sender: FileToFileContent.file_contents
receiver: ChatPromptBuilder.file_contents
- sender: ChatPromptBuilder.prompt
receiver: ChatGenerator.messages
inputs:
files:
- FileToFileContent.sources
query:
- ChatPromptBuilder.query
outputs:
replies: ChatGenerator.replies
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | List of file paths or ByteStream objects to convert. | |
extra | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional extra information to attach to the FileContent objects. Can be used to store provider-specific information. Values should be JSON serializable. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the extra of all produced FileContent objects. If it's a list, its length must match the number of sources as they're zipped together. |
Outputs
| Parameter | Type | Description |
|---|---|---|
file_contents | List[FileContent] | A list of FileContent objects created from the input files. |
Init Parameters
This component has no init parameters.
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | List of file paths or ByteStream objects to convert. | |
extra | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional extra information to attach to the FileContent objects. Can be used to store provider-specific information. Values should be JSON serializable. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the extra of all produced FileContent objects. If it's a list, its length must match the number of sources as they're zipped together. |
Was this page helpful?