MongoDBAtlasFullTextRetriever
Use this retriever with MongoDBAtlasDocumentStore
to retrieve documents using full text search.
Basic Information
- Used with MongoDBAtlasDocumentStore
- Type:
haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever.MongoDBAtlasFullTextRetriever
- Components it can connect with:
Inputs
Required Inputs
Name | Type | Description |
---|---|---|
query | String or list of strings | The user query. If the query contains multiple terms, Atlas Search evaluates each term separately for matches. |
Optional Inputs
Name | Type | Default | Description |
---|---|---|---|
fuzzy | Dictionary of string and integer | None | Enables fuzzy search; finds strings similar to the search terms. You can't use fuzzy with synonyms . For details and examples, see MongoDB Altas Documentation. |
match_criteria | Literal | None | Defines how terms in the query are matched. Possible values are: - any - all For details, see MongoDB Atlas Documentation. |
score | Dictionary of string and dictionary | None | Defines the scoring method for matching results. Possible values are: - boost - constant - function For details, see MongoDB Atlas Documentation . |
synonyms | String | None | The name of the synonym mapping definition int he index. This value can't be an empty string. Note that you can't use synonyms with fuzzy . |
filters | Dictionary of string and any | None | The filters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified in the Retriever configuration.For examples and explanation on how to construct filters, see Filter Syntax. |
top_k | Integer | 10 | The maximum number of documents to return. |
Outputs
Name | Type | Description |
---|---|---|
documents | List of Document objects | The retrieved documents. |
Overview
MongoDBAtlasEmbeddingRetriever
is compatible with MongoDBAtlasDocumentStore
. It finds documents in the document store that contain specific words or phrases by performing searches on the entire content of documents. The search depends on the full_text_search_index
you created in the MongoDB Atlas database.
You can use MongoDBAtlasFullTextRetriever
together withMongoDBAtlasEmbeddingRetriever
to perform hybrid (keyword-based and vector-based) searches for documents in the MongoDBAtlasDocumentStore
.
To learn more about full text search in MongoDB Atlas, check Atlas Search Overview in the MongoDB documentation.
Filters
Optionally, you can pass filters to the Retriever to narrow down the scope of fetched documents. Make sure the fields included in the filters exist in your search index in the MongoDB Atlas database. You can set the filters in two ways:
- On the component card using the
filters
parameter. - By adding the
Filters
input and connecting it to the Retriever. This means filters are provided at search time by the user through API or in Playground.
By default, the filters passed at search time take precedence over the filters configured on the component card, but you can change this using the filter_policy
setting which makes it possible to also merge the two filtering options.
Filter Syntax
For a detailed explanation of filters, their syntax, and examples, see Filter Syntax.
For each filter, you must specify the operator, the field key and value. Logical filters also list the conditions that must be met. For example, this filter searches for documents of "type": "article"
whose "rating"
has value of 3
or more:
operator: "AND"
conditions:
- field: "type"
operator: ==
value: "article"
- field: "rating"
operator: >=
value: 3
Usage Example
Passing Runtime Filters
To allow users to choose filters at query time, connect the Filters
input to the retriever:

Performing a Hybrid Search
This is an example of a RAG pipeline that uses hybrid retrieval: keyword search (full text with MongoDBAtlasFullTextRetriever
) and vector search (using the MongoDBAtlasEmbeddingRetriever
). The documents returned by each retriever are joined using DocumentJoiner
and then sent to the Ranker.
components:
query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
If the answer exists in several documents, summarize them.
Ignore documents that don't contain the answer to the question.
Only answer based on the documents provided. Don't make things up.
If no information related to the question can be found in the document, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.
These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}
Question: {{ question }}
Answer:
required_variables: "*"
llm:
type: deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
init_parameters:
model: us.anthropic.claude-3-7-sonnet-20250219-v1:0
aws_region_name: us-west-2
# Enable extended thinking mode:
# Note that temperature is not supported for extended thinking mode.
thinking:
type: enabled
budget_tokens: 1024 # min budget for Claude 3.7 Sonnet, increase to allow more thinking
max_length: 1674 # includes thinking.budget_tokens
# include_thinking: False # control whether to include thinking output in the reply, defaults to True if unset
# thinking_tag: claudeThinking # set tag to identify thinking output, defaults to "thinking" if unset. If set to null, no tags will be added.
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
# extract_xml_tags: # uncomment to move thinking part into answer's meta
# - thinking
MongoDBAtlasEmbeddingRetriever:
type: haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
document_store:
type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
init_parameters:
mongo_connection_string:
type: env_var
env_vars:
- MONGO_CONNECTION_STRING
strict: false
database_name: mflix
collection_name: embedded_movies
vector_search_index: vector_search
full_text_search_index: default
MongoDBAtlasFullTextRetriever:
type: haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever.MongoDBAtlasFullTextRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
document_store:
type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
init_parameters:
mongo_connection_string:
type: env_var
env_vars:
- MONGO_CONNECTION_STRING
strict: false
database_name: mflix
collection_name: embedded_movies
vector_search_index: vector_search
full_text_search_index: default
connections: # Defines how the components are connected
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: prompt_builder.documents
- sender: meta_field_grouping_ranker.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: llm.prompt
- sender: prompt_builder.prompt
receiver: answer_builder.prompt
- sender: llm.replies
receiver: answer_builder.replies
- sender: query_embedder.embedding
receiver: MongoDBAtlasEmbeddingRetriever.query_embedding
- sender: MongoDBAtlasFullTextRetriever.documents
receiver: document_joiner.documents
- sender: MongoDBAtlasEmbeddingRetriever.documents
receiver: document_joiner.documents
inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "query_embedder.text"
- "ranker.query"
- "prompt_builder.question"
- "answer_builder.query"
- MongoDBAtlasFullTextRetriever.query
filters: # These components will receive a potential query filter as input
- "MongoDBAtlasEmbeddingRetriever.filters"
- "MongoDBAtlasFullTextRetriever.filters"
outputs: # Defines the output of your pipeline
documents: "meta_field_grouping_ranker.documents" # The output of the pipeline is the retrieved documents
answers: "answer_builder.answers" # The output of the pipeline is the generated answers
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
Parameter | Type | Possible Values | Description |
---|---|---|---|
document_store | MongoDBAtlasDocumentStore | An instance of the MongoDBAtlasDocumentStore to use.Required. | |
filters | Dictionary | Default: None | Filters applied to the retrieved documents. Make sure the fields used in the filters are included in the configuration of the full_text_search_index of the MongoDB Atlas database. For detailed information and examples, see Filter Syntax.Optional. |
top_k | Integer | Default: 10 | The maximum number of documents to return. Required. |
filter_policy | FilterPolicy | REPLACE MERGE Default: REPLACE | The policy to determine how to apply filters. Possible values: - REPLACE : The filters provided at search time replace the filters in the component configuration.- MERGE : The filters provided at search time are merged with the filters in the component configuration.Required |
Run() Method Parameters
These are the parameters you can configure for the component's run()
method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
Run() method parameters take precedence over initialization parameters.
Name | Type | Default | Description |
---|---|---|---|
query | String or list of strings | The query or a list of queries to search for. If the query contains multiple terms, Atlas Search evaluates each term separately for matches. Required. | |
fuzzy | Dictionary of string and integer | None | Enables fuzzy search; finds strings similar to the search terms. You can't use fuzzy with synonyms . For details and examples, see MongoDB Altas Documentation. |
match_criteria | Literal | None | Defines how terms in the query are matched. Possible values are: - any - all For details, see MongoDB Atlas Documentation. |
score | Dictionary of string and dictionary | None | Defines the scoring method for matching results. Possible values are: - boost - constant - function For details, see MongoDB Atlas Documentation . |
synonyms | String | None | The name of the synonym mapping definition int he index. This value can't be an empty string. Note that you can't use synonyms with fuzzy . |
filters | Dictionary of string and any | None | The filters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified in the Retriever configuration.For examples and explanation on how to construct filters, see Filter Syntax. |
top_k | Integer | 10 | The maximum number of documents to return. |
Updated about 8 hours ago