MongoDBAtlasEmbeddingRetriever
Retrieves documents from the MongoDBAtlasDocumentStore by embedding similarity. This retriever is only compatible with the MongoDBAtlasDocumentStore.
Key Features
- Retrieves documents from
MongoDBAtlasDocumentStoreby comparing query and document embeddings. - Only compatible with
MongoDBAtlasDocumentStore. - Configurable
top_kto control the number of results returned. - Supports runtime filters for dynamic result filtering.
- Flexible filter policy:
REPLACEorMERGEruntime filters with pre-configured ones.
Configuration
- Drag the
MongoDBAtlasEmbeddingRetrievercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Configure the
MongoDBAtlasDocumentStoreconnection, including the MongoDB connection string, database name, collection name, and vector search index name.
- Configure the
- Go to the Advanced tab to configure
top_kand filters.
Connections
MongoDBAtlasEmbeddingRetriever accepts a query_embedding (list of floats) and optional filters and top_k as inputs. It outputs a documents list sorted by similarity to the query embedding.
Connect a text embedder (such as MistralTextEmbedder or SentenceTransformersTextEmbedder) to its query_embedding input. Connect its documents output to a Ranker or PromptBuilder.
Usage Example
This is a document search pipeline that uses MongoDBAtlasEmbeddingRetriever to retrieve documents by embedding similarity, with MistralTextEmbedder to embed the query and TransformersSimilarityRanker to rank the documents.
components:
MistralTextEmbedder:
type: haystack_integrations.components.embedders.mistral.text_embedder.MistralTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- MISTRAL_API_KEY
strict: false
model: mistral-embed
MongoDBAtlasEmbeddingRetriever:
type: haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
document_store:
type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
init_parameters:
mongo_connection_string:
type: env_var
env_vars:
- MONGO_CONNECTION_STRING
strict: false
database_name: my-db
collection_name: my-collection
vector_search_index: vector-search
full_text_search_index: full-text-search
embedding_field: embedding
content_field: content
TransformersSimilarityRanker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: cross-encoder/ms-marco-MiniLM-L-6-v2
device:
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
top_k: 10
query_prefix: ''
document_prefix: ''
meta_fields_to_embed:
embedding_separator: \n
scale_score: true
calibration_factor: 1
score_threshold:
model_kwargs:
tokenizer_kwargs:
batch_size: 16
connections:
- sender: MistralTextEmbedder.embedding
receiver: MongoDBAtlasEmbeddingRetriever.query_embedding
- sender: MongoDBAtlasEmbeddingRetriever.documents
receiver: TransformersSimilarityRanker.documents
max_runs_per_component: 100
metadata: {}
inputs:
query:
- MistralTextEmbedder.text
- TransformersSimilarityRanker.query
outputs:
documents: TransformersSimilarityRanker.documents
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Embedding of the query. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy configured for the retrieve. For more details, see the Init Parameters section below. |
| top_k | Optional[int] | None | Maximum number of Documents to return. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of Documents most similar to the given query_embedding. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | MongoDBAtlasDocumentStore | An instance of MongoDBAtlasDocumentStore. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. Make sure that the fields used in the filters are included in the configuration of the vector_search_index. You must configure them manually in the Web UI of MongoDB Atlas. |
| top_k | int | 10 | Maximum number of Documents to return. |
| filter_policy | Union[str, FilterPolicy] | FilterPolicy.REPLACE | Policy to determine how filters are applied if they're configured for the component but also passed at runtime. Possible values: MERGE and REPLACE. MERGE: If both filter types target the same field, the runtime filter takes precedence. Logical filters are combined unly if they have the same operator (AND, OR). Comparison filters are combined using the default logical operator (defaults to AND). |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Embedding of the query. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy configured for the retriever. |
| top_k | Optional[int] | None | Maximum number of Documents to return. Overrides the value specified at initialization. |
Was this page helpful?