MongoDBAtlasFullTextRetriever

Use this retriever with MongoDBAtlasDocumentStore to retrieve documents using full text search.

Basic Information

  • Used with MongoDBAtlasDocumentStore
  • Type: haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever.MongoDBAtlasFullTextRetriever
  • Components it can connect with:
    • Query: This Retriever receives the user query and searches for documents based on it.
    • Rankers: The Retriever can send the retrieved documents to a Ranker so that it can rank them and send the top ones to downstream components, such as a ChatPromptBuilder

Inputs

Required Inputs

NameTypeDescription
queryString or list of stringsThe user query. If the query contains multiple terms, Atlas Search evaluates each term separately for matches.

Optional Inputs

NameTypeDefaultDescription
fuzzyDictionary of string and integerNoneEnables fuzzy search; finds strings similar to the search terms. You can't use fuzzy with synonyms. For details and examples, see MongoDB Altas Documentation.
match_criteriaLiteralNoneDefines how terms in the query are matched. Possible values are:

- any
- all
For details, see MongoDB Atlas Documentation.
scoreDictionary of string and dictionaryNoneDefines the scoring method for matching results. Possible values are:

- boost
- constant
- function
For details, see MongoDB Atlas Documentation .
synonymsStringNoneThe name of the synonym mapping definition int he index. This value can't be an empty string. Note that you can't use synonyms with fuzzy.
filtersDictionary of string and anyNoneThe filters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified in the Retriever configuration.
For examples and explanation on how to construct filters, see Filter Syntax.
top_kInteger10The maximum number of documents to return.

Outputs

NameTypeDescription
documentsList of Document objectsThe retrieved documents.

Overview

MongoDBAtlasEmbeddingRetriever is compatible with MongoDBAtlasDocumentStore. It finds documents in the document store that contain specific words or phrases by performing searches on the entire content of documents. The search depends on the full_text_search_index you created in the MongoDB Atlas database.

You can use MongoDBAtlasFullTextRetriever together withMongoDBAtlasEmbeddingRetriever to perform hybrid (keyword-based and vector-based) searches for documents in the MongoDBAtlasDocumentStore.

To learn more about full text search in MongoDB Atlas, check Atlas Search Overview in the MongoDB documentation.

Filters

Optionally, you can pass filters to the Retriever to narrow down the scope of fetched documents. Make sure the fields included in the filters exist in your search index in the MongoDB Atlas database. You can set the filters in two ways:

  • On the component card using the filters parameter.
  • By adding the Filters input and connecting it to the Retriever. This means filters are provided at search time by the user through API or in Playground.

By default, the filters passed at search time take precedence over the filters configured on the component card, but you can change this using the filter_policy setting which makes it possible to also merge the two filtering options.

Filter Syntax

For a detailed explanation of filters, their syntax, and examples, see Filter Syntax.

For each filter, you must specify the operator, the field key and value. Logical filters also list the conditions that must be met. For example, this filter searches for documents of "type": "article" whose "rating" has value of 3 or more:

operator: "AND"
conditions:
- field: "type"
	operator: ==
  value: "article"
- field: "rating"
	operator: >=
  value: 3

Usage Example

Passing Runtime Filters

To allow users to choose filters at query time, connect the Filters input to the retriever:

The Filters component connected to the retriever

Performing a Hybrid Search

This is an example of a RAG pipeline that uses hybrid retrieval: keyword search (full text with MongoDBAtlasFullTextRetriever) and vector search (using the MongoDBAtlasEmbeddingRetriever). The documents returned by each retriever are joined using DocumentJoiner and then sent to the Ranker.


components:
  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        You are a technical expert.
        You answer questions truthfully based on provided documents.
        If the answer exists in several documents, summarize them.
        Ignore documents that don't contain the answer to the question.
        Only answer based on the documents provided. Don't make things up.
        If no information related to the question can be found in the document, say so.
        Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
        Never name the documents, only enter a number in square brackets as a reference.
        The reference must only refer to the number that comes in square brackets after the document.
        Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.

        These are the documents:
        {%- if documents|length > 0 %}
        {%- for document in documents %}
        Document [{{ loop.index }}] :
        Name of Source File: {{ document.meta.file_name }}
        {{ document.content }}
        {% endfor -%}
        {%- else %}
        No relevant documents found.
        Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
        {% endif %}

        Question: {{ question }}
        Answer:

      required_variables: "*"
  llm:
    type: deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
    init_parameters:
      model: us.anthropic.claude-3-7-sonnet-20250219-v1:0
      aws_region_name: us-west-2

      # Enable extended thinking mode:
      # Note that temperature is not supported for extended thinking mode.
      thinking:
        type: enabled
        budget_tokens: 1024  # min budget for Claude 3.7 Sonnet, increase to allow more thinking
      max_length: 1674 # includes thinking.budget_tokens
      # include_thinking: False  # control whether to include thinking output in the reply, defaults to True if unset
      # thinking_tag: claudeThinking  # set tag to identify thinking output, defaults to "thinking" if unset. If set to null, no tags will be added.

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm
      # extract_xml_tags:  # uncomment to move thinking part into answer's meta
      # - thinking

  MongoDBAtlasEmbeddingRetriever:
    type: haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever.MongoDBAtlasEmbeddingRetriever
    init_parameters:
      filters:
      top_k: 10
      filter_policy: replace
      document_store:
        type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
        init_parameters:
          mongo_connection_string:
            type: env_var
            env_vars:
            - MONGO_CONNECTION_STRING
            strict: false
          database_name: mflix
          collection_name: embedded_movies
          vector_search_index: vector_search
          full_text_search_index: default
  MongoDBAtlasFullTextRetriever:
    type: haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever.MongoDBAtlasFullTextRetriever
    init_parameters:
      filters:
      top_k: 10
      filter_policy: replace
      document_store:
        type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore
        init_parameters:
          mongo_connection_string:
            type: env_var
            env_vars:
            - MONGO_CONNECTION_STRING
            strict: false
          database_name: mflix
          collection_name: embedded_movies
          vector_search_index: vector_search
          full_text_search_index: default

connections:  # Defines how the components are connected
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: prompt_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: llm.prompt
- sender: prompt_builder.prompt
  receiver: answer_builder.prompt
- sender: llm.replies
  receiver: answer_builder.replies
- sender: query_embedder.embedding
  receiver: MongoDBAtlasEmbeddingRetriever.query_embedding

- sender: MongoDBAtlasFullTextRetriever.documents
  receiver: document_joiner.documents
- sender: MongoDBAtlasEmbeddingRetriever.documents
  receiver: document_joiner.documents

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "query_embedder.text"
  - "ranker.query"
  - "prompt_builder.question"
  - "answer_builder.query"

  - MongoDBAtlasFullTextRetriever.query
  filters:  # These components will receive a potential query filter as input
  - "MongoDBAtlasEmbeddingRetriever.filters"
  - "MongoDBAtlasFullTextRetriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "meta_field_grouping_ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"  # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}


Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypePossible ValuesDescription
document_storeMongoDBAtlasDocumentStoreAn instance of the MongoDBAtlasDocumentStore to use.
Required.
filtersDictionaryDefault: NoneFilters applied to the retrieved documents. Make sure the fields used in the filters are included in the configuration of the full_text_search_index of the MongoDB Atlas database. For detailed information and examples, see Filter Syntax.
Optional.
top_kIntegerDefault: 10The maximum number of documents to return.
Required.
filter_policyFilterPolicyREPLACE
MERGE
Default: REPLACE
The policy to determine how to apply filters. Possible values:

- REPLACE: The filters provided at search time replace the filters in the component configuration.
- MERGE: The filters provided at search time are merged with the filters in the component configuration.
Required

Run() Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Run() method parameters take precedence over initialization parameters.


NameTypeDefaultDescription
queryString or list of stringsThe query or a list of queries to search for. If the query contains multiple terms, Atlas Search evaluates each term separately for matches.
Required.
fuzzyDictionary of string and integerNoneEnables fuzzy search; finds strings similar to the search terms. You can't use fuzzy with synonyms. For details and examples, see MongoDB Altas Documentation.
match_criteriaLiteralNoneDefines how terms in the query are matched. Possible values are:

- any
- all
For details, see MongoDB Atlas Documentation.
scoreDictionary of string and dictionaryNoneDefines the scoring method for matching results. Possible values are:

- boost
- constant
- function
For details, see MongoDB Atlas Documentation .
synonymsStringNoneThe name of the synonym mapping definition int he index. This value can't be an empty string. Note that you can't use synonyms with fuzzy.
filtersDictionary of string and anyNoneThe filters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified in the Retriever configuration.
For examples and explanation on how to construct filters, see Filter Syntax.
top_kInteger10The maximum number of documents to return.