MongoDBAtlasEmbeddingRetriever
Use this vector-based retriever with MongoDBAtlasDocumentStore
to retrieve documents based on their similarity to the query.
Basic Information
- Used with MongoDBAtlasDocumentStore
- Type:
haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
- Components it can connect with:
Inputs
Required Inputs
Name | Type | Description |
---|---|---|
query_embedding | List of floats | Vector representation of the user query. |
document_store | Document Store | The MongoDBAltasDocumentStore instance this retriever uses to fetch the documents. |
Optional Inputs
Name | Type | Default | Description |
---|---|---|---|
filters | Dictionary | None | The filters applied to the retrieved documents to narrow down the search. |
top_k | Integer | None | The maximum number of documents to fetch from the document store. |
Outputs
Name | Type | Description |
---|---|---|
documents | List of Document objects | The retrieved documents. |
Overview
MongoDBAtlasEmbeddingRetriever
is compatible with MongoDBAtlasDocumentStore
. It fetches documents from there by comparing the similarity of the query and document embeddings and fetching the most relevant ones. The similarity setting used is the one that was chosen when creating the vector search index in the MongoDB Atlas database. For details, see MongoDB documentation on similarity.
Filters
Optionally, you can pass filters to the Retriever to narrow down the scope of fetched documents. Make sure the fields included in the filters exist in your search index in the MongoDB Atlas database. You can set the filters in two ways:
- On the component card using the
filters
parameter. - By adding the
Filters
input and connecting it to the Retriever. This means filters are provided at search time by the user through API or in Playground.
By default, the filters passed at search time take precedence over the filters configured on the component card, but you can change this using the filter_policy
setting which makes it possible to also merge the two filtering options.
Filter Syntax
For a detailed explanation of filters, their syntax, and examples, see Filter Syntax.
For each filter, you must specify the operator, the field key and value. Logical filters also list the conditions that must be met. For example, this filter searches for documents of "type": "article"
whose "rating"
has value of 3
or more:
operator: "AND"
conditions:
- field: "type"
operator: ==
value: "article"
- field: "rating"
operator: >=
value: 3
Usage Example
Passing Runtime Filters
To make it possible for users to choose filters at search time, connect the Filters
input to the retriever:

In a Pipeline
This is an example of a RAG pipeline with MongoDBEmbeddingRetriever
connected to MongoDBAtlasDocumentStore
that sends the retrieved documents to an LLM to generate answers. The Retriever is connected to the Filters
component that allows users to set filters at query time.
components:
query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
If the answer exists in several documents, summarize them.
Ignore documents that don't contain the answer to the question.
Only answer based on the documents provided. Don't make things up.
If no information related to the question can be found in the document, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.
These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}
Question: {{ question }}
Answer:
required_variables: "*"
llm:
type: deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
init_parameters:
model: us.anthropic.claude-3-7-sonnet-20250219-v1:0
aws_region_name: us-west-2
# Enable extended thinking mode:
# Note that temperature is not supported for extended thinking mode.
thinking:
type: enabled
budget_tokens: 1024 # min budget for Claude 3.7 Sonnet, increase to allow more thinking
max_length: 1674 # includes thinking.budget_tokens
# include_thinking: False # control whether to include thinking output in the reply, defaults to True if unset
# thinking_tag: claudeThinking # set tag to identify thinking output, defaults to "thinking" if unset. If set to null, no tags will be added.
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
# extract_xml_tags: # uncomment to move thinking part into answer's meta
# - thinking
MongoDBAtlasEmbeddingRetriever:
type: haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
init_parameters:
filters: "\\n"
top_k: 10
filter_policy: merge
document_store:
type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
init_parameters:
mongo_connection_string:
type: env_var
env_vars:
- MONGO_CONNECTION_STRING
strict: false
database_name: mflix
collection_name: embedded_movies
vector_search_index: vector_search
full_text_search_index: default
connections: # Defines how the components are connected
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: prompt_builder.documents
- sender: meta_field_grouping_ranker.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: llm.prompt
- sender: prompt_builder.prompt
receiver: answer_builder.prompt
- sender: llm.replies
receiver: answer_builder.replies
- sender: query_embedder.embedding
receiver: MongoDBAtlasEmbeddingRetriever.query_embedding
- sender: MongoDBAtlasEmbeddingRetriever.documents
receiver: ranker.documents
inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "query_embedder.text"
- "ranker.query"
- "prompt_builder.question"
- "answer_builder.query"
filters: # These components will receive a potential query filter as input
- "MongoDBAtlasEmbeddingRetriever.filters"
outputs: # Defines the output of your pipeline
documents: "meta_field_grouping_ranker.documents" # The output of the pipeline is the retrieved documents
answers: "answer_builder.answers" # The output of the pipeline is the generated answers
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
Parameter | Type | Possible Values | Description |
---|---|---|---|
filters | Dictionary | Default: None | Filters applied to the retrieved documents. Make sure the fields used in the filters are included in the configuration of the vector_search_index of the MongoDB Atlas database.Optional. |
top_k | Integer | Default: 10 | The maximum number of documents to return. Required. |
filter_policy | FilterPolicy | REPLACE MERGE Default: REPLACE | The policy to determine how to apply filters. Possible values: - REPLACE : The filters provided at search time replace the filters in the component configuration.- MERGE : The filters provided at search time are merged with the filters in the component configuration.Required |
Run() Method Parameters
These are the parameters you can configure for the component's run()
method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
Run() method parameters take precedence over initialization parameters.
Parameter | Type | Possible Values | Description |
---|---|---|---|
query_embedding | List of floats | Vector representation of the query. Required. | |
filters | Dictionary | Default: None | Filters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy chosen in the retriever configuration. By default, runtime filters replace the filters from the component configuration.Optional. |
top_k | Integer | Default: None | The maximum number of documents to return. Optional. |
Updated 9 days ago