DeepsetFileUploader

Takes documents from a pipeline and uploads them to deepset AI Platform as TXT files.

Basic Information

Type: deepset_cloud_custom_nodes.augmenters.deepset_file_uploader.DeepsetFileUploader
Components it can connect with:
- Components accepting a list of Document objects as input and output.

Inputs

Parameter	Type	Default	Description
documents	List[Document]		The documents to upload.
raise_on_failure	bool	False	Whether to raise an error if the documents fail to upload.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A list of uploaded documents.

Overview

DeepsetFileUploader is used in indexes. You can use it with a web crawling component to upload the crawled documents to a specified workspace. It can also upload any other document the index creates. You can specify the workspace where you want to save the created files in the workspace parameter.

Usage Example

Initiating the Component

components:
  DeepsetFileUploader:
    type: augmenters.deepset_file_uploader.DeepsetFileUploader
    init_parameters:

Using the Component in a Pipeline

This is an example of an index where FileUploader receives documents from DeepsetFirecrawlWebScraper, uploads them to deepset AI Platform, and sends them to DocumentWriter to write into a document store:

DeepsetFileUploader used in an index in Pipeline Builder

components:
  FileClassifier:
    type: haystack.components.routers.file_type_router.FileTypeRouter
    init_parameters:
      mime_types:
      - text/csv
      - application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  OutputAdapter:
    type: haystack.components.converters.output_adapter.OutputAdapter
    init_parameters:
      template: |-
        {% set str_list = [] %}
        {% for document in documents %}
          {% set _ = str_list.append(document.content) %}
        {% endfor %}
        {{ str_list }}
      output_type: typing.List[str]
  DocumentWriter:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          embedding_dim: 768
          similarity: cosine
      policy: NONE
  DeepsetCSVRowsToDocumentsConverter:
    type: deepset_cloud_custom_nodes.converters.csv_rows_to_documents.DeepsetCSVRowsToDocumentsConverter
    init_parameters:
      content_column: urls
      encoding: utf-8
  DeepsetFirecrawlWebScraper:
    type: deepset_cloud_custom_nodes.crawler.firecrawl.DeepsetFirecrawlWebScraper
    init_parameters: {}
  XLSXToDocument:
    type: deepset_cloud_custom_nodes.converters.xlsx.XLSXToDocument
    init_parameters:
      document_per: sheet
      content_column: content
      sheet_name:
  DocumentJoiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
      weights:
      top_k:
      sort_by_score: true
  DeepsetFileUploader:
    type: deepset_cloud_custom_nodes.augmenters.deepset_file_uploader.DeepsetFileUploader
    init_parameters:
      workspace:
      api_key:
        type: env_var
        env_vars:
        - DEEPSET_CLOUD_API_KEY
        strict: false
      write_mode: OVERWRITE
      base_url: https://api.cloud.deepset.ai/api/v1

connections:
- sender: OutputAdapter.output
  receiver: DeepsetFirecrawlWebScraper.urls
- sender: FileClassifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  receiver: XLSXToDocument.sources
- sender: DocumentJoiner.documents
  receiver: OutputAdapter.documents
- sender: XLSXToDocument.documents
  receiver: DocumentJoiner.documents
- sender: DeepsetCSVRowsToDocumentsConverter.documents
  receiver: DocumentJoiner.documents
- sender: FileClassifier.text/csv
  receiver: DeepsetCSVRowsToDocumentsConverter.sources
- sender: DeepsetFirecrawlWebScraper.documents
  receiver: DeepsetFileUploader.documents
- sender: DeepsetFileUploader.documents
  receiver: DocumentWriter.documents

max_runs_per_component: 100

metadata: {}

inputs:
  files:
  - FileClassifier.sources

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
workspace	str		The name of the workspace to upload the documents to.
api_key	Secret	Secret.from_env_var('DEEPSET_CLOUD_API_KEY')	The API key.
write_mode	Literal['KEEP', 'OVERWRITE', 'FAIL']	OVERWRITE	The write mode for the upload. Default is "OVERWRITE". You can find possible variants at https://docs.cloud.deepset.ai/reference/upload_file_api_v1_workspaces__workspace_name__files_post
base_url	str	https://api.cloud.deepset.ai/api/v1	The base URL for the API. Default is "https://api.cloud.deepset.ai/api/v1".

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		The documents to upload.
raise_on_failure	bool	False	Whether to raise an error if the documents fail to upload.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Initiating the Component​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​