OpenSearchEmbeddingRetriever
Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric.
Basic Information
- Type:
haystack_integrations.opensearch.src.haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Embedding of the query. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns top_k matching documents. The way runtime filters are applied depends on the filter_policy selected when initializing the Retriever. |
| top_k | Optional[int] | None | Maximum number of documents to return. |
| custom_query | Optional[Dict[str, Any]] | None | A custom OpenSearch query containing a mandatory $query_embedding and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } } For this custom_query, an example run() could be: python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, ) |
| efficient_filtering | Optional[bool] | None | If True, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib". |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Dictionary with key "documents" containing the retrieved Documents. - documents: List of Document similar to query_embedding. |
Overview
Work in Progress
Bear with us while we're working on adding pipeline examples and most common components connections.
Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric.
Must be connected to the OpenSearchDocumentStore to run.
Usage Example
components:
OpenSearchEmbeddingRetriever:
type: opensearch.src.haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | OpenSearchDocumentStore | An instance of OpenSearchDocumentStore to use with the Retriever. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns top_k matching documents. |
| top_k | int | 10 | Maximum number of documents to return. |
| filter_policy | Union[str, FilterPolicy] | FilterPolicy.REPLACE | Policy to determine how filters are applied. Possible options: - merge: Runtime filters are merged with initialization filters. - replace: Runtime filters replace initialization filters. Use this policy to change the filtering scope. |
| custom_query | Optional[Dict[str, Any]] | None | The custom OpenSearch query containing a mandatory $query_embedding and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } } For this custom_query, an example run() could be: python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, ) |
| raise_on_failure | bool | True | If True, raises an exception if the API call fails. If False, logs a warning and returns an empty list. |
| efficient_filtering | bool | False | If True, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib". |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Embedding of the query. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns top_k matching documents. The way runtime filters are applied depends on the filter_policy selected when initializing the Retriever. |
| top_k | Optional[int] | None | Maximum number of documents to return. |
| custom_query | Optional[Dict[str, Any]] | None | A custom OpenSearch query containing a mandatory $query_embedding and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } } For this custom_query, an example run() could be: python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, ) |
| efficient_filtering | Optional[bool] | None | If True, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib". |
Was this page helpful?