DeepsetCloudDocumentStore Parameters

Check the init parameters you can configure for DeepsetCloudDocumentStore in the pipeline YAML.

YAML Init Parameters

πŸ“˜

When you create DeepsetCloudDocumentStore in the deepset Cloud Pipeline Designer, these parameters are ignored:

  • api_key
  • workspace
  • index
  • api_endpoint
  • label_index

In the Python SDK, all parameters are used.

These are the parameters you can specify for DeepsetCloudDocumentStore in the pipeline YAML:

ParameterTypePossible ValuesDescription
api_keyStringThe secret value of the API key. This is the value that you copy in step 4 of Generate an API Key.
If you don't specify it, it is read from the DEEPSET_CLOUD_API_KEY environment variable.
Optional.
workspaceStringDefault: defaultSpecifies the deepset Cloud workspace you want to use.
Required.
indexStringDefault: NoneThe name of the pipeline to access within the deepset Cloud workspace.
In deepset Cloud, indexes share the names with their respective pipelines.
Optional
duplicate_documentsStringskip - Ignores duplicate documents.
overwrite - Updates any existing documents with the same ID when adding documents.
fail - Raises an error if a document ID of the document that is being added already exists.
Default: overwrite
Specifies how to handle duplicate documents.
This setting only has an effect if you specify the fields you want to use to identify duplicate documents in the PreProcessor's id_hash_keys parameter. For example, to identify duplicate documents by their content, set id_hash_keys: content.
Note that we add contextual metadata, like file_id, to your documents during indexing. This is why setting id_hash_keys: meta doesn't work.
Required.
api_endpointStringDefault: NoneSpecifies the URL of the deepset Cloud API. The API endpoint is: <https://api.cloud.deepset.ai/api/v1>.

If you don't specify it, it's read from the DEEPSET_CLOUD_API_ENDPOINT environment variable.
Optional.
similarityStringdot_product - Default, use it if an embedding model was optimized for dot_product similarity.
cosine - Recommended if the embedding model was optimized for cosine similarity.
Default: dot_product
Specifies the similarity function used to compare document vectors.
Required.
label_indexStringDefault: defaultSpecifies the name of the evaluation set uploaded to deepset Cloud.
In deepset Cloud, label indexes share the name with their corresponding evaluation sets.
Required.
return_embeddingBooleanTrue/False
Default: False
Returns document embeddings.
Required.
embedding_dimintDefault: 768Specifies the dimensionality of the embedding vector. You only need this parameter if you're using a vector-based retriever, such as a DensePassageRetriever or EmbeddingRetriever.
Required.
use_prefilteringBooleanTrue/False
Default: False
Specifies when to apply filters to search. This is only relevant if you use an EmbeddingRetriever. With EmbeddingRetriever, DeepsetCloudDocumentStore defaults to post-filtering when querying with filters. This means the filters are applied after the documents are retrieved. You can change it to pre-filtering, where the filters are applied before retrieving the documents. this comes at the cost of higher latency, though. For the BM25Retriever filtering is always applied before a search.
Required.
search_fieldsUnion[str, list]Default: contentThe names of fields BM25Retriever uses to find matches to the incoming query in the documents. For example: ["content", "title"].
Required.

REST API Runtime Parameters

There are no runtime parameters you can pass to this node when making a request to the Search REST API endpoint.