Skip to main content

DeepsetFileDownloader

Downloads files with the extensions you specify and stores them in the local file system.

Basic Information

  • Type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
  • Components it can connect with:
    • Rankers: It can receive documents from Rankers and download them.
    • DeepsetPDFDocumentToBase64Image: It can send the downloaded PDFs to DeepsetPDFDocumentToBase64Image so that it can turn them into images.

Inputs

ParameterTypeDefaultDescription
documentsList[Document] | NoneNoneThe documents to download.
sourcesList[Union[ByteStream, UUID, str]] | NoneNoneThe sources to download.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]The list of downloaded documents with the file path set in the meta field.
sourcesList[Union[str, Path, ByteStream]]The list of downloaded sources with the file path set in the meta field.

Overview

DeepsetFileDownloader is used in visual question answering pipelines as a helper component. It downloads the PDF files containing images and sends them on to the DeepsetPDFDocumentToBase64Image component which converts them into images the LLM can consume.

DeepsetFileDownloader is also needed if you want your pipeline to use the files you upload in Playground. For details, see Test your pipeline.

Usage Example

Initiating the Component

components:
DeepsetFileDownloader:
type: augmenters.deepset_file_downloader.DeepsetFileDownloader
init_parameters:

Using the Component in a Pipeline

This is an example of a visual question answering pipeline:


components:
...
ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: "BAAI/bge-reranker-v2-m3"
top_k: 5
model_kwargs:
torch_dtype: "torch.float16"
tokenizer_kwargs:
model_max_length: 1024

image_downloader:
type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
init_parameters:
file_extensions:
- ".pdf"

pdf_to_image:
type: deepset_cloud_custom_nodes.converters.pdf_to_image.DeepsetPDFDocumentToBase64Image
init_parameters:
detail: "high"
...

connections:
...
- sender: ranker.documents
receiver: image_downloader.documents
- sender: image_downloader.documents
receiver: pdf_to_image.documents
# pdf_to_image is usually connected with PromptBuilder, it sends the converted images to it
...

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
file_extensionsList[str]A list of file extensions to download (for example [".pdf", ".docx", ".txt"]).
sources_target_typeLiteral['str', 'pathlib.Path', 'haystack.dataclasses.ByteStream']strThe type of sources to download.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document] | NoneNoneThe documents to download.
sourcesList[Union[ByteStream, UUID, str]] | NoneNoneThe sources to download.