OpenSearchEmbeddingRetriever

Retrieve documents from the OpenSearchDocumentStore using a vector similarity metric.

Basic Information

Type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
Components it can connect with:
- Text Embedders: OpenSearchEmbeddingRetriever receives the query embedding from a text embedder like SentenceTransformersTextEmbedder or OpenAITextEmbedder.
- PromptBuilder: OpenSearchEmbeddingRetriever can send retrieved documents to PromptBuilder to be used in a prompt.
- Ranker: OpenSearchEmbeddingRetriever can send retrieved documents to a Ranker to reorder them by relevance.
- DocumentJoiner: OpenSearchEmbeddingRetriever can send documents to DocumentJoiner to combine with documents from other retrievers.

Inputs

Parameter	Type	Default	Description
query_embedding	List[float]		Embedding of the query.
filters	Optional[Dict[str, Any]]	None	Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching documents. The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
top_k	Optional[int]	None	Maximum number of documents to return.
custom_query	Optional[Dict[str, Any]]	None	A custom OpenSearch query containing a mandatory `$query_embedding` and an optional `$filters` placeholder. An example custom_query: `python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } }` For this `custom_query`, an example `run()` could be: `python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )`
efficient_filtering	Optional[bool]	None	If `True`, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".

Outputs

Parameter	Type	Default	Description
documents	List[Document]		List of documents similar to the query embedding.

Overview

OpenSearchEmbeddingRetriever is an embedding-based retriever compatible with the OpenSearchDocumentStore. It compares the query and document embeddings and fetches the documents most relevant to the query from the document store based on vector similarity.

When using OpenSearchEmbeddingRetriever in your pipeline, make sure it has the query and document embeddings available. Add a document embedder to your indexing pipeline and a text embedder to your query pipeline to create these embeddings.

In addition to the query_embedding, the retriever accepts other optional parameters, including top_k (the maximum number of documents to retrieve) and filters to narrow down the search space.

The embedding_dim for storing and retrieving embeddings must be defined when the corresponding OpenSearchDocumentStore is initialized.

If you want exact keyword matching instead of semantic similarity, use the OpenSearchBM25Retriever.

Usage Example

Using the Component in a Pipeline

This is an example of a semantic search pipeline where OpenSearchEmbeddingRetriever receives the query embedding from a text embedder and retrieves matching documents.

components:
  text_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: sentence-transformers/all-MiniLM-L6-v2
      device:
      token:
      prefix: ''
      suffix: ''
      batch_size: 32
      progress_bar: true
      normalize_embeddings: false
      trust_remote_code: false
  OpenSearchEmbeddingRetriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: ''
          max_chunk_bytes: 104857600
          embedding_dim: 384
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
          similarity: cosine
      filters:
      top_k: 10
      filter_policy: replace
      custom_query:
      raise_on_failure: true
      efficient_filtering: true

connections:
- sender: text_embedder.embedding
  receiver: OpenSearchEmbeddingRetriever.query_embedding

max_runs_per_component: 100

metadata: {}

inputs:
  query:
  - text_embedder.text
  filters:
  - OpenSearchEmbeddingRetriever.filters

outputs:
  documents: OpenSearchEmbeddingRetriever.documents

Using in a RAG Pipeline

This example shows a RAG pipeline that uses OpenSearchEmbeddingRetriever to find relevant documents, then passes them to a generator to answer a question.

components:
  text_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: sentence-transformers/all-MiniLM-L6-v2
      device:
      token:
      prefix: ''
      suffix: ''
      batch_size: 32
      progress_bar: true
      normalize_embeddings: false
      trust_remote_code: false
  retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: ''
          max_chunk_bytes: 104857600
          embedding_dim: 384
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
          similarity: cosine
      filters:
      top_k: 10
      filter_policy: replace
      custom_query:
      raise_on_failure: true
      efficient_filtering: true
  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      required_variables: "*"
      template: |-
        Given the following documents, answer the question.

        Documents:
        {% for document in documents %}
        {{ document.content }}
        {% endfor %}

        Question: {{ question }}
        Answer:
  generator:
    type: haystack.components.generators.openai.OpenAIGenerator
    init_parameters:
      api_key:
        type: env_var
        env_vars:
        - OPENAI_API_KEY
        strict: true
      model: gpt-4o-mini
      generation_kwargs:
  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

connections:
- sender: text_embedder.embedding
  receiver: retriever.query_embedding
- sender: retriever.documents
  receiver: prompt_builder.documents
- sender: prompt_builder.prompt
  receiver: generator.prompt
- sender: generator.replies
  receiver: answer_builder.replies
- sender: retriever.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: answer_builder.prompt

max_runs_per_component: 100

metadata: {}

inputs:
  query:
  - text_embedder.text
  - prompt_builder.question
  - answer_builder.query
  filters:
  - retriever.filters

outputs:
  documents: retriever.documents
  answers: answer_builder.answers

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
document_store	OpenSearchDocumentStore		An instance of OpenSearchDocumentStore to use with the Retriever.
filters	Optional[Dict[str, Any]]	None	Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching documents.
top_k	int	10	Maximum number of documents to return.
filter_policy	Union[str, FilterPolicy]	FilterPolicy.REPLACE	Policy to determine how filters are applied. Possible options: - `merge`: Runtime filters are merged with initialization filters. - `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope.
custom_query	Optional[Dict[str, Any]]	None	The custom OpenSearch query containing a mandatory `$query_embedding` and an optional `$filters` placeholder. An example custom_query: `python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } }` For this `custom_query`, an example `run()` could be: `python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )`
raise_on_failure	bool	True	If `True`, raises an exception if the API call fails. If `False`, logs a warning and returns an empty list.
efficient_filtering	bool	False	If `True`, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query_embedding	List[float]		Embedding of the query.
filters	Optional[Dict[str, Any]]	None	Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching documents. The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
top_k	Optional[int]	None	Maximum number of documents to return.
custom_query	Optional[Dict[str, Any]]	None	A custom OpenSearch query containing a mandatory `$query_embedding` and an optional `$filters` placeholder. An example custom_query: `python { "query": { "bool": { "must": [ { "knn": { "embedding": { "vector": "$query_embedding", // mandatory query placeholder "k": 10000, } } } ], "filter": "$filters" // optional filter placeholder } } }` For this `custom_query`, an example `run()` could be: `python retriever.run( query_embedding=embedding, filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )`
efficient_filtering	Optional[bool]	None	If `True`, the filter will be applied during the approximate kNN search. This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Using the Component in a Pipeline​

Using in a RAG Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​