OllamaDocumentEmbedder
Calculate document embeddings using Ollama models.
Key Features
- Uses Ollama to run embedding models locally, without external API services.
- Stores the computed embedding in the
embeddingfield of each document. - Compatible with embedding models available in Ollama's library.
- Default model is
nomic-embed-text. - Supports embedding metadata fields alongside document content.
Configuration
Before using this component, make sure you have a running Ollama instance with the embedding model pulled.
- Drag the
OllamaDocumentEmbeddercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set the model name. The model must already be available in your running Ollama instance. See other pre-built models in Ollama's library.
- Set the
urlto point to your Ollama server (default:http://localhost:11434).
- Go to the Advanced tab to configure
timeout,batch_size,meta_fields_to_embed, andembedding_separator.
Embedding Models in Query Pipelines and Indexes
The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.
This means the embedders for your indexing and query pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.
Connections
OllamaDocumentEmbedder accepts a list of documents as input. In an indexing pipeline, connect it to converters such as TextFileToDocument or preprocessors such as DocumentSplitter.
It outputs a list of documents with the embedding field populated. Connect its documents output to DocumentWriter to store the embedded documents.
To embed a query string instead of documents, use OllamaTextEmbedder.
Source Code
To check this component's source code, open document_embedder.py in the Haystack Core Integrations repository.
Usage Examples
Basic Configuration
OllamaDocumentEmbedder:
type: haystack_integrations.components.embedders.ollama.document_embedder.OllamaDocumentEmbedder
init_parameters:
model: nomic-embed-text
url: http://localhost:11434
timeout: 120
prefix: ''
suffix: ''
progress_bar: true
embedding_separator: "\n"
batch_size: 32
In this index, OllamaDocumentEmbedder receives documents from DocumentSplitter and embeds them. It then sends the embedded documents to DocumentWriter. The index uses the nomic-embed-text model, which means OllamaTextEmbedder in the query pipeline must use the same model.
components:
TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
store_full_path: false
DocumentSplitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 200
split_overlap: 0
split_threshold: 0
splitting_function:
OllamaDocumentEmbedder:
type: haystack_integrations.components.embedders.ollama.document_embedder.OllamaDocumentEmbedder
init_parameters:
model: nomic-embed-text
url: http://localhost:11434
generation_kwargs:
timeout: 120
prefix: ''
suffix: ''
progress_bar: true
meta_fields_to_embed:
embedding_separator: "\n"
batch_size: 32
DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ollama-embeddings-index
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
similarity: cosine
policy: NONE
connections:
- sender: TextFileToDocument.documents
receiver: DocumentSplitter.documents
- sender: DocumentSplitter.documents
receiver: OllamaDocumentEmbedder.documents
- sender: OllamaDocumentEmbedder.documents
receiver: DocumentWriter.documents
max_runs_per_component: 100
metadata: {}
inputs:
files:
- TextFileToDocument.sources
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | Documents to be converted to an embedding. | |
generation_kwargs | Optional[Dict[str, Any]] | None | Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | Documents with their embeddings added to the embedding field. | |
meta | Dict[str, Any] | Metadata about the request, including the model name. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | nomic-embed-text | The name of the model to use. The model should be available in the running Ollama instance. |
url | str | http://localhost:11434 | The URL of a running Ollama instance. |
generation_kwargs | Optional[Dict[str, Any]] | None | Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs. |
timeout | int | 120 | The number of seconds before throwing a timeout error from the Ollama API. |
prefix | str | A string to add at the beginning of each text. | |
suffix | str | A string to add at the end of each text. | |
progress_bar | bool | True | If True, shows a progress bar when running. |
meta_fields_to_embed | Optional[List[str]] | None | List of metadata fields to embed along with the document text. |
embedding_separator | str | \n | Separator used to concatenate the metadata fields to the document text. |
batch_size | int | 32 | Number of documents to process at once. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | Documents to be converted to an embedding. | |
generation_kwargs | Optional[Dict[str, Any]] | None | Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. See the Ollama docs. |
Was this page helpful?