DeepsetFileUploader

Upload the indexed documents to a deepset workspace as text files.

Basic Information

  • Type: deepset_cloud_custom_nodes.augmenters.deepset_file_uploader.DeepsetFileUploader
  • Components it can connect with:
    • Components accepting a list of Document objects as input and output.

Inputs

Required Inputs

NameTypeDescription
documentsLIst of Document objectsThe list of documents to turn into text files and save in a workspace.

Optional Inputs

NameTypeDefaultDescription
raise_on_failureBooleanFalseReturns an error message if it fails to upload the documents.

Outputs

NameTypeDescription
documentsList of Document objectsThe documents produced by the indexing pipeline.

Overview

DeepsetFileUploader is used in indexes. You can use it with a web crawling component to upload the crawled documents to a specified workspace. It can also upload any other document the index creates. You can specify the workspace where you want to save the created files in the workspace parameter.

Usage Example

This is an example of an index where FileUploader receives documents from DeepsetFirecrawlWebScraper, uploads them to deepset AI Platform, and sends them to DocumentWriter to write into a document store:

Here's the full YAML configuration you can paste into the YAML editor:

components:
  FileClassifier:
    type: haystack.components.routers.file_type_router.FileTypeRouter
    init_parameters:
      mime_types:
      - text/csv
      - application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  OutputAdapter:
    type: haystack.components.converters.output_adapter.OutputAdapter
    init_parameters:
      template: |-
        {% set str_list = [] %}
        {% for document in documents %}
          {% set _ = str_list.append(document.content) %}
        {% endfor %}
        {{ str_list }}
      output_type: typing.List[str]
  DocumentWriter:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          embedding_dim: 768
          similarity: cosine
      policy: NONE
  DeepsetCSVRowsToDocumentsConverter:
    type: deepset_cloud_custom_nodes.converters.csv_rows_to_documents.DeepsetCSVRowsToDocumentsConverter
    init_parameters:
      content_column: urls
      encoding: utf-8
  DeepsetFirecrawlWebScraper:
    type: deepset_cloud_custom_nodes.crawler.firecrawl.DeepsetFirecrawlWebScraper
    init_parameters: {}
  XLSXToDocument:
    type: deepset_cloud_custom_nodes.converters.xlsx.XLSXToDocument
    init_parameters:
      document_per: sheet
      content_column: content
      sheet_name:
  DocumentJoiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
      weights:
      top_k:
      sort_by_score: true
  DeepsetFileUploader:
    type: deepset_cloud_custom_nodes.augmenters.deepset_file_uploader.DeepsetFileUploader
    init_parameters:
      workspace:
      api_key:
        type: env_var
        env_vars:
        - DEEPSET_CLOUD_API_KEY
        strict: false
      write_mode: OVERWRITE
      base_url: https://api.cloud.deepset.ai/api/v1

connections:
- sender: OutputAdapter.output
  receiver: DeepsetFirecrawlWebScraper.urls
- sender: FileClassifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  receiver: XLSXToDocument.sources
- sender: DocumentJoiner.documents
  receiver: OutputAdapter.documents
- sender: XLSXToDocument.documents
  receiver: DocumentJoiner.documents
- sender: DeepsetCSVRowsToDocumentsConverter.documents
  receiver: DocumentJoiner.documents
- sender: FileClassifier.text/csv
  receiver: DeepsetCSVRowsToDocumentsConverter.sources
- sender: DeepsetFirecrawlWebScraper.documents
  receiver: DeepsetFileUploader.documents
- sender: DeepsetFileUploader.documents
  receiver: DocumentWriter.documents

max_runs_per_component: 100

metadata: {}

inputs:
  files:
  - FileClassifier.sources

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:


Parameter

Type

Possible Values

Description

workspace

String

The name of the deepset workspace where you want to save the resulting TXT files.
Required.

api_key

Secret

Secret.from_env_var("DEEPSET_CLOUD_API_KEY")

deepset API key. By default, it's read from the DEEPSET_CLOUD_API_KEY environment variable.
Required.

write_mode

Literal

KEEP
OVERWRITE (default)
FAIL

Specifies what to do if a file with the same name already exists in the workspace. Possible values:

  • KEEP: Keeps both files.
  • OVERWRITE: Overwrites the existing file with the uploaded one.
  • FAIL: Fails to upload.
    Required.

base_url

String

Default: https://api.cloud.deepset.ai/api/v1

The URL for deepset deployment.
Required.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Run() method parameters take precedence over initialization parameters.


Parameter

Type

Possible Values

Description

documents

List of document objects

The documents to upload.
Required.

raise_on_failure

Boolean

True
False
Default: False

Raises an error if the component fails to upload the documents.
Required.