Skip to main content

OpenSearchEmbeddingRetriever

Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric.

Basic Information

  • Type: haystack_integrations.opensearch.src.haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever

Inputs

ParameterTypeDefaultDescription
query_embeddingList[float]Embedding of the query.
filtersOptional[Dict[str, Any]]NoneFilters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns top_k matching documents. The way runtime filters are applied depends on the filter_policy selected when initializing the Retriever.
top_kOptional[int]NoneMaximum number of documents to return.
custom_queryOptional[Dict[str, Any]]NoneA custom OpenSearch query containing a mandatory $query_embedding and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } } For this custom_query, an example run() could be: python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )
efficient_filteringOptional[bool]NoneIf True, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".

Outputs

ParameterTypeDefaultDescription
documentsList[Document]Dictionary with key "documents" containing the retrieved Documents. - documents: List of Document similar to query_embedding.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric.

Must be connected to the OpenSearchDocumentStore to run.

Usage Example

components:
OpenSearchEmbeddingRetriever:
type: opensearch.src.haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeOpenSearchDocumentStoreAn instance of OpenSearchDocumentStore to use with the Retriever.
filtersOptional[Dict[str, Any]]NoneFilters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns top_k matching documents.
top_kint10Maximum number of documents to return.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied. Possible options: - merge: Runtime filters are merged with initialization filters. - replace: Runtime filters replace initialization filters. Use this policy to change the filtering scope.
custom_queryOptional[Dict[str, Any]]NoneThe custom OpenSearch query containing a mandatory $query_embedding and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } } For this custom_query, an example run() could be: python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )
raise_on_failureboolTrueIf True, raises an exception if the API call fails. If False, logs a warning and returns an empty list.
efficient_filteringboolFalseIf True, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
query_embeddingList[float]Embedding of the query.
filtersOptional[Dict[str, Any]]NoneFilters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns top_k matching documents. The way runtime filters are applied depends on the filter_policy selected when initializing the Retriever.
top_kOptional[int]NoneMaximum number of documents to return.
custom_queryOptional[Dict[str, Any]]NoneA custom OpenSearch query containing a mandatory $query_embedding and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } } For this custom_query, an example run() could be: python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )
efficient_filteringOptional[bool]NoneIf True, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".