YAML Init Parameters

These are the parameters you can specify in pipeline YAML:

BM25Retriever Parameters

Parameter	Type	Possible Values	Description
`document_store`	String	`DeepsetCloudDocumentStore`	Specifies the instance of a document store from which the retriever retrieves the documents. deepset Cloud supports `DeepsetCloudDocumentStore` only. Optional.
`top_k`	Integer	Default: `10`	Specifies the number of documents to return for a query. Mandatory.
`all_terms_must_match`	Boolean	`True` `False` (default)	Specifies if all terms in the query must match the document. `True` - Retrieves the document only if all terms from the query are also present in the document. Uses the `AND` operator implicitly, for example, "good vegetarian restaurant" looks for "good AND vegetarian AND restaurant. `False` - Retrieves the document if at least one query term exists in the document. Uses the `OR` operator implicitly, for example, "good vegetarian restaurant" looks for "good OR vegetarian OR restaurant". Mandatory.
`custom_query`	String	The query	Specifies the optional OpenSearch query. For more information, see Boosting Retrieval with OpenSearch Queries. Optional.
`scale_score`	Boolean	`True` (default) `False`	Scales the similarity score calculated to compare the similarity between the query and the documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.

CNStaticFilterEmbeddingRetriever Parameters

Parameter	Type	Possible Values	Description
`embedding_model`	String	Example: `sentence-transformers/all-MiniLM-L6-v2`	Specifies the path to the embedding model for handling documents and query. This can be the path to a locally saved model or the model's name in the Hugging Face's model hub. Mandatory.
`document_store`	String	`DeepsetCloudDocumentStore`	Specifies the instance of a document store from which the retriever retrieves the documents. deepset Cloud supports `DeepsetCloudDocumentStore` only. Optional.
`model_version`	String	Tag name, branch name, or commit hash	Specifies the version of the model to be used from the Hugging Face model hub. Optional.
`use_gpu`	Boolean	`True` (default) `False`	Specifies whether to use all available GPUs or the CPU. If no GPU is available, it falls back on the CPU. Mandatory.
`batch_size`	Integer	Default: `32`	Specifies the number of documents to encode at once. Mandatory.
`max_seq_len`	Integer	Default: `512`	Specifies the maximum number of tokens the document text can have. Longer documents are truncated. Mandatory.
`model_format`	String	`farm` `transformers` `sentence_transformers` `retribert` `openai` `cohere`	Specifies the name of the framework used for saving the model or the model type. If you don't provide it, it's inferred from the model configuration files. Optional.
`pooling_strategy`	String	`cls_token` (sentence vector) `reduce_mean` (default, sentence vector) `reduce_max` (sentence vector) `per_token` (individual token vectors)	Specifies the strategy for combining the embeddings from the model. Used for FARM and transformer models only. Mandatory.
`emb_extraction_layer`	Integer	Default: `-1`(the last layer)	Specifies the number of layers from which to extract the embeddings. Used for FARM and transformer models only. Mandatory.
`top_k`	Integer	Default: `10`	Specifies the number of documents to retrieve. Mandatory.
`progress_bar`	Boolean	`True` (default) `False`	Shows a tqdm progress bar. Disabling it in production deployments helps to keep the logs clean. Mandatory.
`devices`	String	Example: `[torch.device('cuda:0'), "mps", "cuda:1"]`	Contains a list of GPU devices to limit inference to certain GPUs and not use all available ones. If you set `use_gpu=False`, this parameter is not used and a single CPU device is used for inference. As multi-GPU training is currently not implemented for EmbeddingRetriever, training only uses the first device provided in this list. Optional.
`use_auth_token`	Union[str, bool]		The API token for downloading private models from Hugging Face. `True` - uses the token generated when running `transformers-cli login` (stored in ~/.huggingface. For more information, see Hugging Face. Optional.
`scale_score`	Boolean	`True` (default) `False`	Scales the similarity score calculated to measure the similarity between the query and documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.
`embed_meta_fields`	List of strings		Concatenates the meta fields you specify and the text passage or table to a text pair that is then used to create the embedding. This approach is likely to improve performance if your metadata contain meaningful information for retrieval (for example, topic, entities, and the like). Optional.
`api_key`	String		The OpenAI API key or the Cohere API key. Required if you want to use OpenAI or Cohere embeddings. For more details, see OpenAI and Cohere documentation. Optional.
`azure_api_version`	String	Default: `2022-12-01`	The version of the Azure OpenAI API to use. Mandatory.
`azure_base_url`	String		The base URL for the Azure OpenAI API. If not supplied, Azure OpenAI API is not used. This parameter is an OpenAI Azure endpoint, usually in the form `https://.openai.azure.com` Optional.
`azure_deployment_name`	String		The name of the Azure OpenAI API deployment. If not supplied, Azure OpenAI API is not used. Optional.
`api_base`	String	Default: `"https://api.openai.com/v1"`	The OpenAI API base URL. Required.
`openai_organization`	String	Default: `None`	The OpenAI organization ID. For more details, see OpenAI documentation. Optional.
`filters`	Dictionary	Default: `None`	A list of static filters (metadata fields) that can be overwritten at runtime. Optional.

DensePassageRetriever Parameters

Parameter	Type	Possible Values	Description
`document_store`	String	`DeepsetCloudDocumentStore`	Specifies the instance of a document store from which the retriever retrieves the documents. deepset Cloud supports `DeepsetCloudDocumentStore` only. Optional.
`query_embedding_model`	String	Default: `facebook/dpr-question_encoder-single-nq-base`	Specifies the path to the embedding model for handling the query. This can be a path to a locally saved model or the name of the model in the Hugging Face model hub. Must be trained on the same data as the passage embedding model. Mandatory.
`passage_embedding_model`	String	Default: `facebook/dpr-ctx_encoder-single-nq-base`	Specifies the path to the embedding model for handling the documents. This can be a path to a locally saved model or the name of the model in the Hugging Face model hub. Must be trained on the same data as the query embedding model. Mandatory.
`model_version`	String	Tag name, branch name, or commit hash	Specifies the version of the model to be used from the Hugging Face model hub. Optional.
`max_seq_len_query`	Integer	Default: `64`	Specifies the maximum number of tokens the query can have. Longer queries are truncated. Mandatory.
`max_seq_len_passage`	Integer	Default: `256`	Specifies the maximum number of tokens the document text can have. Longer documents are truncated. Mandatory.
`top_k`	Integer	Default: `10`	Specifies the number of documents to return per query. Mandatory.
`use_gpu`	Boolean	`True` (default) `False`	Uses all available GPUs or the CPU. Falls back on the CPU if no GPU is available. Mandatory.
`batch_size`	Integer	Default: `16`	Specifies the number of questions or passages to encode at once. If there are multiple GPUs, this value is the total batch size. Mandatory.
`embed_title`	Boolean	`True` (default) `False`	Concatenates the title and the document to a text pair that is then used to create the embedding. This is the approach used in the original paper and is likely to improve performance if your titles contain meaningful information for retrieval. The title is expected to be in `doc.meta["name"]` and you can provide it in the documents before writing them to the DocumentStore like this: `{"text": "my text", "meta": {"name": "my title"}}`. Mandatory.
`use_fast_tokenizers`	Boolean	`True` (default) `False`	Uses fast Rust tokenizers. Mandatory.
`similarity_function`	String	`dot_product` (default) `cosine`	Specifies the function to apply for calculating the similarity of query and passage embeddings during training. Mandatory.
`global_loss_buffer_size`	Integer	Default: `150000`	Specifies the buffer size for `all_gather()` in DDP. Increase this value if you encounter errors similar to "encoded data exceeds max_size...". Mandatory.
`progress_bar`	Boolean	`True` (deault) `False`	Shows a tqdm progress bar. Disabling it in production deployments helps to keep the logs clean. Mandatory.
`devices`	String	A list of GPU devices Example: `[torch.device('cuda:0'), "mps", "cuda:1"]`	Contains a list of GPU devices to limit inference to certain GPUs and not use all available GPUs. As multi-GPU training is currently not implemented for DPR, training only uses the first device provided in this list. Optional.
`use_auth_token`	Union[str, bool]		Contains the API token used to download private models from Hugging Face. If set to `True`, the local token is used. You must first create this token using the transformer-cli login. For more information, see Transformers > Models, Optional.
`scale_score`	Boolean	`True` (default) `False`	Scales the similarity score calculated to compare the similarity of the query and the documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.

EmbeddingRetriever Parameters

Parameter	Type	Possible Values	Description
`embedding_model`	String	Example: `sentence-transformers/all-MiniLM-L6-v2`	Specifies the path to the embedding model for handling documents and query. This can be the path to a locally saved model or the model's name. Mandatory.
`document_store`	String	`DeepsetCloudDocumentStore`	Specifies the instance of a document store from which the retriever retrieves the documents. deepset Cloud supports `DeepsetCloudDocumentStore` only. Optional.
`model_version`	String	Tag name, branch name, or commit hash	Specifies the version of the model to be used from the Hugging Face model hub. Optional.
`use_gpu`	Boolean	`True` (default) `False`	Specifies whether to use all available GPUs or the CPU. If no GPU is available, it falls back on the CPU. Mandatory.
`batch_size`	Integer	Default: `32`	Specifies the number of documents to encode at once. Mandatory.
`max_seq_len`	Integer	Default: `512`	Specifies the maximum number of tokens the document text can have. Longer documents are truncated. Mandatory.
`model_format`	String	`farm` `transformers` `sentence_transformers` `retribert` `openai` `cohere`	Specifies the name of the framework used for saving the model or the model type. If you don't provide it, it's inferred from the model configuration files. Optional.
`query_prompt`	String	Default: `None`	Instructions for the model to embed the text of the query. Optional.
`passage_prompt`	String	Default: `None`	Instructions for the model to embed the text of the documents to be retrieved. Optional.
`pooling_strategy`	String	`cls_token` (sentence vector) `reduce_mean` (default, sentence vector) `reduce_max` (sentence vector) `per_token` (individual token vectors)	Specifies the strategy for combining the embeddings from the model. Used for FARM and transformer models only. Mandatory.
`emb_extraction_layer`	Integer	Default: `-1`(the last layer)	Specifies the number of layers from which to extract the embeddings. Used for FARM and transformer models only. Mandatory.
`top_k`	Integer	Default: `10`	Specifies the number of documents to retrieve. Mandatory.
`progress_bar`	Boolean	`True` (default) `False`	Shows a tqdm progress bar. Disabling it in production deployments helps to keep the logs clean. Mandatory.
`devices`	String	Example: `[torch.device('cuda:0'), "mps", "cuda:1"]`	Contains a list of GPU devices to limit inference to certain GPUs and not use all available ones. If you set `use_gpu=False`, this parameter is not used and a single CPU device is used for inference. As multi-GPU training is currently not implemented for EmbeddingRetriever, training only uses the first device provided in this list. Optional.
`use_auth_token`	Union[str, bool]	Default: `None`	The API token for downloading private models from Hugging Face. `True` - uses the token generated when running `transformers-cli login` (stored in ~/.huggingface. For more information, see Hugging Face. Optional.
`scale_score`	Boolean	`True` (default) `False`	Scales the similarity score calculated to measure the similarity between the query and documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.
`embed_meta_fields`	List of strings	Default: `None`	Concatenates the meta fields you specify and the text passage or table to a text pair that is then used to create the embedding. This approach is likely to improve performance if your metadata contain meaningful information for retrieval (for example, topic, entities, and the like). Optional.
`api_key`	String	Default: `None`	The OpenAI API key or the Cohere API key. Required if you want to use OpenAI or Cohere embeddings. For more details, see OpenAI and Cohere documentation. Optional.
`azure_api_version`	String	Default: `2022-12-01`	The version of the Azure OpenAI API to use. Mandatory.
`azure_base_url`	String	Default: `None`	The base URL for the Azure OpenAI API. If not supplied, Azure OpenAI API is not used. This parameter is an OpenAI Azure endpoint, usually in the form `https://.openai.azure.com` Optional.
`azure_deployment_name`	String	Default: `None`	The name of the Azure OpenAI API deployment. If not supplied, Azure OpenAI API is not used. Optional.
`api_base`	String	Default: `https://api.openai.com/v1`	The OpenAI API base URL. Required.
`openai_organization`	String	Default: `None`	The OpenAI organization ID. For more details, see OpenAI documentation. Optional.
`aws_config`	Dictionary[string, any]	Default: `None`	The aws_config contains {aws_access_key, aws_secret_key, aws_region, profile_name} to use with the boto3 session for an AWS Bedrock retriever. Optional.

FileSimilarityRetriever Parameters

Parameter	Type	Possible Values	Description
`document_store`	String	Default: `KeywordDocumentStore`	The instance of DeepsetCloudDocumentStore to retriever from. Mandatory.
`file_aggregation_key`	String	Default: `file_id`	The metadata key from the file metadata that you want to use to aggregate documents to the file level. This is what you pass as query. For example, if you have a metadata key called "file_name" which contains the name of the file, you can set it as the `file_aggregation_key`. Then, you pass the `file_name` value as query and the retriever finds documents similar to this file. Mandatory.
`primary_retriever`	String	Default: `None`	The name of the primary retriever to use. Optional.
`secondary_retriever`	String	Default: `None`	The name of the secondary retriever to use. Optional.
`keep_original_score`	String	Default: `None`	Stores the original score of the returned document in the document's metadata. Replaces the document's score property with the reciprocal rank fusion score. Optional.
`top_k`	Integer	Default: `10`	The number of documents to return. Mandatory.
`max_query_len`	Integer	Default: `6000`	The number of characters the query document can have. If a document is longer than the specified length, it's cut off. Mandatory.
`max_num_queries`	Integer	Default: `None`	The maximum number of queries that can be run for a single file. If the number of query documents exceeds this limit, the query documents are split into n parts so that n < `max_num_queries` and every nth document is kept. Optional.
`use_existing_embedding`	Boolean	`True` `False` Default: `True`	Reuses existing embeddings from the index. To optimize the speed, set this to `True`. This way, the FileSimilarityRetriever can run on the CPU. Mandatory.

FilterRetriever Parameters

Parameter	Type	Possible Values	Description
`document_store`	String	`DeepsetCloudDocumentStore`	Specifies the document store from where the retriever fetches the documents. deepset Cloud supports `DeepsetCloudDocumentStore` only. Optional.
`top_k`	Integer	Default: `10`	The number of documents to fetch. Mandatory.
`all_terms_must_match`	Boolean	`True` `False` (default)	Specifies if all terms of the query must match the document. `True` retrieves the document only if all terms from the query are also present in the document. It uses the `AND` operator implicitly. For example, "good vegetarian restaurant" looks for "good AND vegetarian AND restaurant". `False` retrieves the document if at least one query term exists in the document. It uses the `OR` operator implicitly. For example, "good vegetarian restaurant" looks for "good OR vegetarian OR restaurant". Mandatory.
`custom_query`	String		Specifies the custom OpenSearch query. For more information, see Boosting Retrieval with OpenSearch Queries. Optional.
`scale_score`	Boolean	`True` (default) `False`	Scales the similarity score calculated for the query and the documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.

TfidfRetriever Parameters

Argument	Type	Possible Values	Description
`document_store`	String	`DeepsetCloudDocumentStore`	Specifies the document store from which the retriever retrieves the documents. deepset Cloud supports `DeepsetCloudDocumentStore` only. Optional.
`top_k`	Integer	Default: `10`	Specifies the number of documents to return for a query. Mandatory.
`auto_fit`	Boolean	`True` (default) `False`	Specifies whether to automatically update the TF-IDF matrix by calling the `fit()` method after new documents are added. Mandatory.

REST API Runtime Parameters

These are the runtime parameters you can pass in the body of the request to the Search endpoint:

BM25Retriever Parameters

Parameter	Type	Possible Values	Description
`top_k`	Integer	Default: `10`	Specifies the number of documents to return for a query. Mandatory.
`scale_score`	Boolean	`True` `False` Default: `True`	Scales the similarity score calculated to compare the similarity between the query and the documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.

CNStaticFilterRetriever Parameters

Parameter	Type	Possible Values	Description
`top_k`	Integer	Default: `10`	Specifies the number of documents to retrieve. Mandatory.

EmbeddingbeddingRetriever Parameters

Parameter	Type	Possible Values	Description
`top_k`	Integer	Default: `10`	Specifies the number of documents to return for a query. Mandatory.
`scale_score`	Boolean	`True` `False` Default: `True`	Scales the similarity score calculated to compare the similarity between the query and the documents to a unit interval in the range of 0 to 1, where 1 means extremely relevant. `True` - Scales similarity scores that naturally have a different value range, such as cosine or dot_product. `False` - Uses raw similarity scores. Mandatory.

FileSimilarityRetriever Parameters

Parameter	Type	Possible Values	Description
`top_k`	Integer	Default: `10`	Specifies the number of documents to retrieve. Mandatory.

TfidfRetriever Parameters

Parameter	Type	Possible Values	Description
`top_k`	Integer	Default: `10`	Specifies the number of documents to retrieve. Mandatory.