Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

MongoDBAtlasEmbeddingRetriever

Retrieves documents from the MongoDBAtlasDocumentStore by embedding similarity. This retriever is only compatible with the MongoDBAtlasDocumentStore.

Key Features

  • Retrieves documents by comparing the query embedding against document embeddings stored in MongoDB Atlas.
  • Only compatible with MongoDBAtlasDocumentStore.
  • Configurable top_k to control the number of retrieved documents.
  • Supports runtime filter overrides with configurable filter_policy (MERGE or REPLACE).
  • Requires a text embedder (such as SentenceTransformersTextEmbedder or MistralTextEmbedder) to produce the query embedding.

Configuration

  1. Drag the MongoDBAtlasEmbeddingRetriever component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    1. Configure the MongoDBAtlasDocumentStore connection, including mongo_connection_string, database_name, collection_name, and vector_search_index. Create a secret with your MongoDB connection string using MONGO_CONNECTION_STRING as the secret key. For instructions, see Create Secrets.
    2. Set top_k to control the maximum number of documents returned.
  4. Go to the Advanced tab to configure filters and filter_policy.

Connections

MongoDBAtlasEmbeddingRetriever accepts a query embedding (list of floats) through its query_embedding input, and optional filters and top_k overrides at runtime. It outputs retrieved documents through its documents output.

Connect a text embedder's embedding output to MongoDBAtlasEmbeddingRetriever's query_embedding input. Connect its documents output to a Ranker or directly to the pipeline output.

Source Code

To check this component's source code, open embedding_retriever.py in the Haystack Core Integrations repository.

Usage Examples

Basic Configuration

  MongoDBAtlasEmbeddingRetriever:
type: haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
init_parameters:
top_k: 10
filter_policy: replace
document_store:
type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
init_parameters:
mongo_connection_string:
type: env_var
env_vars:
- MONGO_CONNECTION_STRING
strict: false
database_name: my-db
collection_name: my-collection
vector_search_index: vector-search
full_text_search_index: full-text-search
embedding_field: embedding
content_field: content

This is a document search pipeline that uses MongoDBAtlasEmbeddingRetriever to retrieve documents by embedding similarity, with MistralTextEmbedder to embed the query and TransformersSimilarityRanker to rank the documents.

components:
MistralTextEmbedder:
type: haystack_integrations.components.embedders.mistral.text_embedder.MistralTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- MISTRAL_API_KEY
strict: false
model: mistral-embed

MongoDBAtlasEmbeddingRetriever:
type: haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
document_store:
type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
init_parameters:
mongo_connection_string:
type: env_var
env_vars:
- MONGO_CONNECTION_STRING
strict: false
database_name: my-db
collection_name: my-collection
vector_search_index: vector-search
full_text_search_index: full-text-search
embedding_field: embedding
content_field: content

TransformersSimilarityRanker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: cross-encoder/ms-marco-MiniLM-L-6-v2
device:
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
top_k: 10
query_prefix: ''
document_prefix: ''
meta_fields_to_embed:
embedding_separator: \n
scale_score: true
calibration_factor: 1
score_threshold:
model_kwargs:
tokenizer_kwargs:
batch_size: 16

connections:
- sender: MistralTextEmbedder.embedding
receiver: MongoDBAtlasEmbeddingRetriever.query_embedding

- sender: MongoDBAtlasEmbeddingRetriever.documents
receiver: TransformersSimilarityRanker.documents

max_runs_per_component: 100

metadata: {}

inputs:
query:
- MistralTextEmbedder.text
- TransformersSimilarityRanker.query

outputs:
documents: TransformersSimilarityRanker.documents

Parameters

Inputs

ParameterTypeDescription
query_embeddingList[float]Embedding of the query.
filtersOptional[Dict[str, Any]]Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy configured for the retriever.
top_kOptional[int]Maximum number of Documents to return.

Outputs

ParameterTypeDescription
documentsList[Document]List of Documents most similar to the given query_embedding.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeMongoDBAtlasDocumentStoreAn instance of MongoDBAtlasDocumentStore.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved Documents. Make sure that the fields used in the filters are included in the configuration of the vector_search_index. You must configure them manually in the Web UI of MongoDB Atlas.
top_kint10Maximum number of Documents to return.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied if they're configured for the component but also passed at runtime. Possible values: MERGE and REPLACE. MERGE: If both filter types target the same field, the runtime filter takes precedence. Logical filters are combined unly if they have the same operator (AND, OR). Comparison filters are combined using the default logical operator (defaults to AND).

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
query_embeddingList[float]Embedding of the query.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy configured for the retriever.
top_kOptional[int]NoneMaximum number of Documents to return. Overrides the value specified at initialization.