MongoDBAtlasDocumentStore

Use the MongoDB database as the document store for storing data your pipelines can query.

Basic Information

Overview

For details, see MongoDB documentation and Haystack documentation.

Authorization

You must have an Atlas account. For details on setting it up, see MongoDB documentation.

To connect to the MongoDB database, you must provide a connection string in the format "mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}". For detailed instructions on how to obtain it, see Create a Connection String in MongoDB documentation.

Once you have your connection string, connect MongoDB to deepset AI Platform on the Connections page:

  1. Log in to deepset AI Platform.

  2. Click your initials in the top right corner and choose Connections.

    The menu expanded with the connections options highlighted
  3. Scroll down the page to find MongoDB and click Connect next to it.

  4. Paste your MongoDB connection string and click Connect.

Usage

To configure MongoDB as the document store, you need:

  • The name of the database to use.
  • The name of the collection to use. This collection must have a vector search index set up on the embedding field and a full text search index.
  • The name of the vector search index to use for vector search. You can create a vector search index on your collection in the Atlas web user interface.
    Important: Your MongoDB vector search index must have an embedding field of type knnVector defined, for example:
    const index = {  
             name: "vector_index",  
             type: "vectorSearch",  
             definition: {  
               "fields": [  
                 {  
                   "type": "knnVector",  
                   "path": "embedding",  
                   "similarity": "cosine",  
                   "numDimensions": 768  
                 }  
               ]  
             }  
         }
    
    This allows you to use embedding retrieval with the MongoDB document store.
  • The name of the full text search index.

You must create both a vector search index and a full txt search index on your Atlas collection, one for embeddings and one for text. Vector search index is used for embedding-based similarity queries, while the full-text index is used for keyword queries. This way, you can use hybrid search out of the box.

For details on how to set up your database, see MongoDB documentation.

Writing Data to MongoDB

To write the preprocessed files into the MongoDB document store:

  1. Add DocumentWriter to your index.
  2. Add MongoDBAtlasDocumentStore and configure the required parameters on the component card. You must provide:
    • database_name: The name of your MongoDB database.
    • collection_name: The name of your collection.
    • vector_search_index: The name of the vector search index created on your collection. Make sure the index has an embedding field of type knnVector defined.
  3. Connect MongoDBAtlasDocumentStore to DocumentWriter.

Retrieving Files From MongoDB

To retrieve files from the MongoDB document store and use them for search:

  1. Add a MongoDB Atlas Retriever your query pipeline.
  2. Add MongoDBAtlasDocumentStore and configure the required parameters on the component card. You must provide:
    • database_name: The name of your MongoDB database.
    • collection_name: The name of your collection.
    • vector_search_index: The name of the vector search index created on your collection. Make sure the index has an embedding field of type knnVector defined.
  3. Connect MongoDBAtlasDocumentStore to the Retriever.

Examples

This is how you connect the document store to writer:

MongoDB configured and connected to writer

When you switch to YAML, you can see that the document store is a parameter of DocumentWriter and that's where you can configure it as well:

writer:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.mongodb_atlas.document_store.MongoDBAtlasDocumentStore #document store configuration
        init_parameters:
          mongo_connection_string: "type: env_var\nenv_vars:\n  - MONGO_CONNECTION_STRING\nstrict: false\n"
          database_name: myDatabase
          collection_name: myCollection
          vector_search_index: vectorIndex
          full_text_search_index: fullTextIndex
      policy: OVERWRITE

Init Parameters

To check the initialization parameters MongoDB takes, see MongoDB Atlas API reference in Haystack documentation.