DeepsetFileDownloader

Use DeepsetFileDownloader to download files with the extensions you specify and store them in the local file system.

Basic Information

Type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
Components it can connect with:
- Rankers: It can receive documents from Rankers and download them.
- DeepsetPDFDocumentToBase64Image: It can send the downloaded PDFs to DeepsetPDFDocumentToBase64Image so that it can turn them into images.

Inputs

Name	Type	Description
`documents`	List of documents	The documents to download.

Outputs

Name	Type	Description
`documents`	List of documents	The list of downloaded documents with the file path set in the meta field.

Overview

DeepsetFileDownloader is used in visual question answering pipelines as a helper component. It downloads the PDF files containing images and sends them on to the DeepsetPDFDocumentToBase64Image component which converts them into images the LLM can consume.

Usage Example

This is an example of a visual question answering pipeline


  components:
  ...
   ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: "BAAI/bge-reranker-v2-m3"
      top_k: 5
      model_kwargs:
        torch_dtype: "torch.float16"
      tokenizer_kwargs:
        model_max_length: 1024

  image_downloader:
    type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
    init_parameters:
      file_extensions:
        - ".pdf"

  pdf_to_image:
    type: deepset_cloud_custom_nodes.converters.pdf_to_image.DeepsetPDFDocumentToBase64Image
    init_parameters:
      detail: "high"
      ...
      
connections:
...
- sender: ranker.documents
  receiver: image_downloader.documents
- sender: image_downloader.documents
  receiver: pdf_to_image.documents
  # pdf_to_image is usually connected with PromptBuilder, it sends the converted images to it
  ...

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Possible Values	Description
`file_extensions`	List of strings	Default: `None`	A list of file extensions to download.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Run() method parameters take precedence over initialization parameters.

Parameter	Type	Description
`documents`	List of `Document` objects	Documents to download.

Updated 4 months ago