Skip to main content

InMemoryDocumentStore

Stores data in-memory. It's ephemeral and cannot be saved to disk.

Basic Information

  • Type: haystack.document_stores.in_memory.document_store.InMemoryDocumentStore

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

InMemoryDocumentStore stores data in-memory and is ephemeral, meaning it cannot be saved to disk. It's useful for testing, development, and scenarios where you need a simple, fast document store that doesn't persist data.

This document store supports both BM25 retrieval (keyword-based) and embedding retrieval (vector-based). It can be used with the following retrievers:

  • InMemoryBM25Retriever
  • InMemoryEmbeddingRetriever

Usage Example

components:
InMemoryDocumentStore:
type: haystack.document_stores.in_memory.document_store.InMemoryDocumentStore
init_parameters:
bm25_algorithm: "BM25L"
embedding_similarity_function: "dot_product"
return_embedding: true

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
bm25_tokenization_regexstrr"(?u)\b\w\w+\b"The regular expression used to tokenize the text for BM25 retrieval.
bm25_algorithmLiteral['BM25Okapi', 'BM25L', 'BM25Plus']BM25LThe BM25 algorithm to use. One of "BM25Okapi", "BM25L", or "BM25Plus".
bm25_parametersOptional[Dict]NoneParameters for BM25 implementation in a dictionary format. For example: {'k1':1.5, 'b':0.75, 'epsilon':0.25}. You can learn more about these parameters by visiting https://github.com/dorianbrown/rank_bm25.
embedding_similarity_functionLiteral['dot_product', 'cosine']dot_productThe similarity function used to compare Documents embeddings. One of "dot_product" (default) or "cosine". To choose the most appropriate function, look for information about your embedding model.
indexOptional[str]NoneA specific index to store the documents. If not specified, a random UUID is used. Using the same index allows you to store documents across multiple InMemoryDocumentStore instances.
async_executorOptional[ThreadPoolExecutor]NoneOptional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be initialized and used.
return_embeddingboolTrueWhether to return the embedding of the retrieved Documents. Default is True.