DeepsetFileUploader
Upload the documents from your indexing pipeline to a deepset Cloud workspace as text files.
Basic Information
- Pipeline type: Indexing
- Type:
deepset_cloud_custom_nodes.augmenters.deepset_file_uploader.DeepsetFileUploader
- Components it can connect with:
- Components accepting a list of
Document
objects as input and output.
- Components accepting a list of
Inputs
Required Inputs
Name | Type | Description |
---|---|---|
documents | LIst of Document objects | The list of documents to turn into text files and save in a workspace. |
Optional Inputs
Name | Type | Default | Description |
---|---|---|---|
raise_on_failure | Boolean | False | Returns an error message if it fails to upload the documents. |
Outputs
Name | Type | Description |
---|---|---|
documents | List of Document objects | The documents produced by the indexing pipeline. |
Overview
DeepsetFileUploader
runs in indexing pipelines. You can use it with a web crawling component to upload the crawled documents to a specified workspace. It can also upload any other document the indexing pipeline creates. You can specify the workspace where you want to save the created files in the workspace
parameter.
Usage Example
This is an example of an indexing pipeline where FileUploader
receives documents from DeepsetFirecrawlWebScraper
, uploads them to deepset Cloud and sends them to DocumentWriter
to write into a document store:
Here's the full YAML configuration you can paste into the YAML editor:
components:
FileClassifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/csv
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |-
{% set str_list = [] %}
{% for document in documents %}
{% set _ = str_list.append(document.content) %}
{% endfor %}
{{ str_list }}
output_type: typing.List[str]
DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
similarity: cosine
policy: NONE
DeepsetCSVRowsToDocumentsConverter:
type: deepset_cloud_custom_nodes.converters.csv_rows_to_documents.DeepsetCSVRowsToDocumentsConverter
init_parameters:
content_column: urls
encoding: utf-8
DeepsetFirecrawlWebScraper:
type: deepset_cloud_custom_nodes.crawler.firecrawl.DeepsetFirecrawlWebScraper
init_parameters: {}
XLSXToDocument:
type: deepset_cloud_custom_nodes.converters.xlsx.XLSXToDocument
init_parameters:
document_per: sheet
content_column: content
sheet_name:
DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
weights:
top_k:
sort_by_score: true
DeepsetFileUploader:
type: deepset_cloud_custom_nodes.augmenters.deepset_file_uploader.DeepsetFileUploader
init_parameters:
workspace:
api_key:
type: env_var
env_vars:
- DEEPSET_CLOUD_API_KEY
strict: false
write_mode: OVERWRITE
base_url: https://api.cloud.deepset.ai/api/v1
connections:
- sender: OutputAdapter.output
receiver: DeepsetFirecrawlWebScraper.urls
- sender: FileClassifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: XLSXToDocument.sources
- sender: DocumentJoiner.documents
receiver: OutputAdapter.documents
- sender: XLSXToDocument.documents
receiver: DocumentJoiner.documents
- sender: DeepsetCSVRowsToDocumentsConverter.documents
receiver: DocumentJoiner.documents
- sender: FileClassifier.text/csv
receiver: DeepsetCSVRowsToDocumentsConverter.sources
- sender: DeepsetFirecrawlWebScraper.documents
receiver: DeepsetFileUploader.documents
- sender: DeepsetFileUploader.documents
receiver: DocumentWriter.documents
max_runs_per_component: 100
metadata: {}
inputs:
files:
- FileClassifier.sources
Init Parameters
Parameter | Type | Possible Values | Description |
---|---|---|---|
workspace | String | The name of the deepset Cloud workspace where you want to save the resulting TXT files. Required. | |
api_key | Secret | Secret.from_env_var("DEEPSET_CLOUD_API_KEY") | deepset Cloud API key. By default, it's read from the DEEPSET_CLOUD_API_KEY environment variable.Required. |
write_mode | Literal | KEEP OVERWRITE (default)FAIL | Specifies what to do if a file with the same name already exists in the workspace. Possible values: - KEEP : Keeps both files.- OVERWRITE : Overwrites the existing file with the uploaded one.- FAIL : Fails to upload.Required. |
base_url | String | Default: https://api.cloud.deepset.ai/api/v1 | The URL for deepset Cloud deployment. Required. |
Updated about 17 hours ago