DeepsetCloudDocumentStore Parameters

Check the init parameters you can configure for DeepsetCloudDocumentStore in the pipeline YAML.

YAML Init Parameters


When you create DeepsetCloudDocumentStore in the deepset Cloud Pipeline Designer, these parameters are ignored:

  • api_key
  • workspace
  • index
  • api_endpoint
  • label_index

In the Python SDK, all parameters are used.

These are the parameters you can specify for DeepsetCloudDocumentStore in the pipeline YAML:

ParameterTypePossible ValuesDescription
api_keyStringThe secret value of the API key. This is the value that you copy in step 4 of Generate an API Key.
If you don't specify it, it is read from the DEEPSET_CLOUD_API_KEY environment variable.
workspaceStringDefault: defaultSpecifies the deepset Cloud workspace you want to use.
indexStringDefault: NoneThe name of the pipeline to access within the deepset Cloud workspace.
In deepset Cloud, indexes share the names with their respective pipelines.
duplicate_documentsStringskip - Ignores duplicate documents.
overwrite - Updates any existing documents with the same ID when adding documents.
fail - Raises an error if a document ID of the document that is being added already exists.
Default: overwrite
Specifies how to handle duplicate documents.
This setting only has an effect if you specify the fields you want to use to identify duplicate documents in the PreProcessor's id_hash_keys parameter. For example, to identify duplicate documents by their content, set id_hash_keys: content.
Note that we add contextual metadata, like file_id, to your documents during indexing. This is why setting id_hash_keys: meta doesn't work.
api_endpointStringDefault: NoneSpecifies the URL of the deepset Cloud API. The API endpoint is: <>.

If you don't specify it, it's read from the DEEPSET_CLOUD_API_ENDPOINT environment variable.
similarityStringdot_product - Default, use it if an embedding model was optimized for dot_product similarity.
cosine - Recommended if the embedding model was optimized for cosine similarity.
Default: dot_product
Specifies the similarity function used to compare document vectors.
label_indexStringDefault: defaultSpecifies the name of the evaluation set uploaded to deepset Cloud.
In deepset Cloud, label indexes share the name with their corresponding evaluation sets.
Default: False
Returns document embeddings.
embedding_dimintDefault: 768Specifies the dimensionality of the embedding vector. You only need this parameter if you're using a vector-based retriever, such as a DensePassageRetriever or EmbeddingRetriever.
Default: False
Specifies when to apply filters to search. This is only relevant if you use an EmbeddingRetriever. With EmbeddingRetriever, DeepsetCloudDocumentStore defaults to post-filtering when querying with filters. This means the filters are applied after the documents are retrieved. You can change it to pre-filtering, where the filters are applied before retrieving the documents. this comes at the cost of higher latency, though. For the BM25Retriever filtering is always applied before a search.
search_fieldsUnion[str, list]Default: contentThe names of fields BM25Retriever uses to find matches to the incoming query in the documents. For example: ["content", "title"].

REST API Runtime Parameters

There are no runtime parameters you can pass to this node when making a request to the Search REST API endpoint.