Check the init parameters you can configure for DeepsetCloudDocumentStore in the pipeline YAML.
YAML Init Parameters
When you create DeepsetCloudDocumentStore in the deepset Cloud Pipeline Designer, these parameters are ignored:
api_key
workspace
index
api_endpoint
label_index
In the Python SDK, all parameters are used.
These are the parameters you can specify for DeepsetCloudDocumentStore in the pipeline YAML:
Parameter | Type | Possible Values | Description |
---|---|---|---|
api_key | String | The secret value of the API key. This is the value that you copy in step 4 of Generate an API Key. If you don't specify it, it is read from the DEEPSET_CLOUD_API_KEY environment variable.Optional. | |
workspace | String | Default: default | Specifies the deepset Cloud workspace you want to use. Required. |
index | String | Default: None | The name of the pipeline to access within the deepset Cloud workspace. In deepset Cloud, indexes share the names with their respective pipelines. Optional |
duplicate_documents | String | skip - Ignores duplicate documents.overwrite - Updates any existing documents with the same ID when adding documents.fail - Raises an error if a document ID of the document that is being added already exists.Default: overwrite | Specifies how to handle duplicate documents. This setting only has an effect if you specify the fields you want to use to identify duplicate documents in the PreProcessor's id_hash_keys parameter. For example, to identify duplicate documents by their content, set id_hash_keys: content .Note that we add contextual metadata, like file_id , to your documents during indexing. This is why setting id_hash_keys: meta doesn't work.Required. |
api_endpoint | String | Default: None | Specifies the URL of the deepset Cloud API. The API endpoint is: <https://api.cloud.deepset.ai/api/v1 >.If you don't specify it, it's read from the DEEPSET_CLOUD_API_ENDPOINT environment variable.Optional. |
similarity | String | dot_product - Default, use it if an embedding model was optimized for dot_product similarity.cosine - Recommended if the embedding model was optimized for cosine similarity.Default: dot_product | Specifies the similarity function used to compare document vectors. Required. |
label_index | String | Default: default | Specifies the name of the evaluation set uploaded to deepset Cloud. In deepset Cloud, label indexes share the name with their corresponding evaluation sets. Required. |
return_embedding | Boolean | True /False Default: False | Returns document embeddings. Required. |
embedding_dim | int | Default: 768 | Specifies the dimensionality of the embedding vector. You only need this parameter if you're using a vector-based retriever, such as a DensePassageRetriever or EmbeddingRetriever .Required. |
use_prefiltering | Boolean | True/False Default: False | Specifies when to apply filters to search. This is only relevant if you use an EmbeddingRetriever . With EmbeddingRetriever , DeepsetCloudDocumentStore defaults to post-filtering when querying with filters. This means the filters are applied after the documents are retrieved. You can change it to pre-filtering, where the filters are applied before retrieving the documents. this comes at the cost of higher latency, though. For the BM25Retriever filtering is always applied before a search.Required. |
search_fields | Union[str, list] | Default: content | The names of fields BM25Retriever uses to find matches to the incoming query in the documents. For example: ["content", "title"] .Required. |
REST API Runtime Parameters
There are no runtime parameters you can pass to this node when making a request to the Search REST API endpoint.