ElasticsearchEmbeddingRetriever

Retrieve documents from the ElasticsearchDocumentStore based on their semantic similarity to the query.

Basic Information

Type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
Components it can connect with:
- DocumentEmbedder: The Retriever receives the query embedding from a DocumentEmbedder.
- Rankers: The Retriever can send the retrieved documents to a Ranker.
- DocumentJoiner: The Retriever can send the retrieved documents to a DocumentJoiner. This is useful if you're using hybrid retrieval that comobines keyword and semantic searches.

Inputs

Parameter	Type	Default	Description
query_embedding	List[float]		Embedding of the query.
filters	Optional[Dict[str, Any]]	None	Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching documents. The way runtime filters are applied depends on the `filter_policy` selected when configuring the Retriever.
top_k	Optional[int]	None	Maximum number of documents to return.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		List of documents most similar to the given `query_embedding`.

Overview

ElasticsearchEmbeddingRetriever is only compatible with ElasticsearchDocumentStore. It's a semantic-based retriever that uses semantic similarity to find documents relevant to a user's query. It compares the query embedding to the document embeddings and fetches the most similar documents from the document store.

When using ElasticsearchEmbeddingRetriever in your pipeline, add a TextEmbedder before it. Also, make sure your index uses a DocumentEmbedder to embed the documents. The embedding models must be the same in your index and query pipeline.

You can use a hybrid retrieval approach by combining ElasticsearchEmbeddingRetriever with ElasticsearchBM25Retriever and then joining the results with a DocumentJoiner. For details, see the Usage Example section.

Usage Example

Initializing the Component

components:
  ElasticsearchEmbeddingRetriever:
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
    init_parameters:

Using the Component in a Pipeline

This is an example of a document search pipeline that uses ElasticsearchEmbeddingRetriever combined with ElasticsearchBM25Retriever and then joins the results with a DocumentJoiner.

components:
  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: "intfloat/simlm-msmarco-reranker"
      top_k: 20

  ElasticsearchEmbeddingRetriever:
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
    init_parameters:
      filters:
      top_k: 10
      num_candidates:
      filter_policy: replace
      document_store:
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
        init_parameters:
          hosts:
          custom_mapping:
          index: 'my_index'
          embedding_similarity_function: cosine
  ElasticsearchBM25Retriever:
    type: haystack_integrations.components.retrievers.elasticsearch.bm25_retriever.ElasticsearchBM25Retriever
    init_parameters:
      filters:
      fuzziness: AUTO
      top_k: 10
      scale_score: false
      filter_policy: replace
      document_store:
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
        init_parameters:
          hosts:
          custom_mapping:
          index: 'my_index'
          embedding_similarity_function: cosine

connections:  # Defines how the components are connected
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: query_embedder.embedding
  receiver: ElasticsearchEmbeddingRetriever.query_embedding
- sender: ElasticsearchEmbeddingRetriever.documents
  receiver: document_joiner.documents
- sender: ElasticsearchBM25Retriever.documents
  receiver: document_joiner.documents

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "query_embedder.text"
  - "ranker.query"
  - ElasticsearchBM25Retriever.query

  filters:  # These components will receive a potential query filter as input
  - "ElasticsearchEmbeddingRetriever.filters"
  - "ElasticsearchBM25Retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "ranker.documents"  # The output of the pipeline is the retrieved documents

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
document_store	ElasticsearchDocumentStore		The Elasticsearch document store to retrieve documents from.
filters	Optional[Dict[str, Any]]	None	Filters applied to the retrieved Documents. Filters are applied during the approximate KNN search to ensure that top_k matching documents are returned.
top_k	int	10	Maximum number of Documents to return.
num_candidates	Optional[int]	None	Number of approximate nearest neighbor candidates on each shard. Defaults to top_k * 10. Increasing this value improves search accuracy at the cost of slower search speeds. You can read more about it in the Elasticsearch documentation
filter_policy	Union[str, FilterPolicy]	FilterPolicy.REPLACE	Policy to determine how filters are applied. Possible options: - `REPLACE` (default): Overrides the initialization filters with the filters specified at runtime. Use this policy to dynamically change filtering for specific queries. - `MERGE`: Combines runtime filters with initialization filters to narrow down the search.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query_embedding	List[float]		Embedding of the query.
filters	Optional[Dict[str, Any]]	None	Filters applied when fetching documents from the Document Store. Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching documents. The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
top_k	Optional[int]	None	Maximum number of documents to return.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Initializing the Component​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​