InMemoryDocumentStore
Stores data in-memory. It's ephemeral and cannot be saved to disk.
Basic Information
- Type:
haystack.document_stores.in_memory.document_store.InMemoryDocumentStore
Overview
Work in Progress
Bear with us while we're working on adding pipeline examples and most common components connections.
InMemoryDocumentStore stores data in-memory and is ephemeral, meaning it cannot be saved to disk. It's useful for testing, development, and scenarios where you need a simple, fast document store that doesn't persist data.
This document store supports both BM25 retrieval (keyword-based) and embedding retrieval (vector-based). It can be used with the following retrievers:
InMemoryBM25RetrieverInMemoryEmbeddingRetriever
Usage Example
components:
InMemoryDocumentStore:
type: haystack.document_stores.in_memory.document_store.InMemoryDocumentStore
init_parameters:
bm25_algorithm: "BM25L"
embedding_similarity_function: "dot_product"
return_embedding: true
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| bm25_tokenization_regex | str | r"(?u)\b\w\w+\b" | The regular expression used to tokenize the text for BM25 retrieval. |
| bm25_algorithm | Literal['BM25Okapi', 'BM25L', 'BM25Plus'] | BM25L | The BM25 algorithm to use. One of "BM25Okapi", "BM25L", or "BM25Plus". |
| bm25_parameters | Optional[Dict] | None | Parameters for BM25 implementation in a dictionary format. For example: {'k1':1.5, 'b':0.75, 'epsilon':0.25}. You can learn more about these parameters by visiting https://github.com/dorianbrown/rank_bm25. |
| embedding_similarity_function | Literal['dot_product', 'cosine'] | dot_product | The similarity function used to compare Documents embeddings. One of "dot_product" (default) or "cosine". To choose the most appropriate function, look for information about your embedding model. |
| index | Optional[str] | None | A specific index to store the documents. If not specified, a random UUID is used. Using the same index allows you to store documents across multiple InMemoryDocumentStore instances. |
| async_executor | Optional[ThreadPoolExecutor] | None | Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be initialized and used. |
| return_embedding | bool | True | Whether to return the embedding of the retrieved Documents. Default is True. |
Was this page helpful?