Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

OllamaGenerator

Generate text using an LLM running on Ollama.

Key Features

  • Connects to a locally running Ollama instance for local LLM inference.
  • Uses quantized GGUF format by default, enabling LLMs to run without GPUs on standard hardware.
  • Supports streaming responses and configurable generation parameters.
  • Configurable system prompt, template override, and keep-alive settings.
  • Compatible with all models in the Ollama library.

Configuration

  1. Drag the OllamaGenerator component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Enter the model name. The model must already be pulled in the running Ollama instance.
  4. Go to the Advanced tab to configure the Ollama URL, timeout, generation kwargs, and streaming callback.

Connections

OllamaGenerator receives a prompt string from PromptBuilder. It outputs replies (a list of generated strings) and meta (request metadata). Connect its replies output to AnswerBuilder or DeepsetAnswerBuilder.

Usage Example

This pipeline uses OllamaGenerator to generate replies to a question. It uses DeepsetAnswerBuilder to build the answers with references.

components:
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
similarity: cosine
top_k: 10
OllamaTextEmbedder:
type: haystack_integrations.components.embedders.ollama.text_embedder.OllamaTextEmbedder
init_parameters:
model: nomic-embed-text
url: http://localhost:11434
generation_kwargs:
timeout: 120
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: "*"
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
If the answer exists in several documents, summarize them.
Ignore documents that don't contain the answer to the question.
Only answer based on the documents provided. Don't make things up.
If no information related to the question can be found in the document, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, for example [3] for Document [3].
Never name the documents, only enter a number in square brackets as a reference.

These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}]:
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
{% endif %}

Question: {{ question }}
Answer:
OllamaGenerator:
type: haystack_integrations.components.generators.ollama.generator.OllamaGenerator
init_parameters:
model: llama3
url: http://localhost:11434
generation_kwargs:
temperature: 0.7
num_predict: 1024
system_prompt:
template:
raw: false
timeout: 120
streaming_callback:
keep_alive:
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

connections:
- sender: OllamaTextEmbedder.embedding
receiver: retriever.query_embedding
- sender: retriever.documents
receiver: prompt_builder.documents
- sender: prompt_builder.prompt
receiver: OllamaGenerator.prompt
- sender: OllamaGenerator.replies
receiver: answer_builder.replies
- sender: retriever.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: answer_builder.prompt

inputs:
query:
- OllamaTextEmbedder.text
- prompt_builder.question
- answer_builder.query
filters:
- retriever.filters

outputs:
documents: retriever.documents
answers: answer_builder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
promptstrThe prompt to generate a response for.
generation_kwargsOptional[Dict[str, Any]]NoneOptional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
streaming_callbackOptional[Callable[[StreamingChunk], None]]NoneA callback function that is called when a new token is received from the stream.

Outputs

ParameterTypeDefaultDescription
repliesList[str]A list of replies generated by the model.
metaList[Dict[str, Any]]Information about the request, such as token count and model details.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelstrorca-miniThe name of the model to use. The model should be available in the running Ollama instance.
urlstrhttp://localhost:11434The URL of a running Ollama instance.
generation_kwargsOptional[Dict[str, Any]]NoneOptional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
system_promptOptional[str]NoneOptional system message (overrides what is defined in the Ollama Modelfile).
templateOptional[str]NoneThe full prompt template (overrides what is defined in the Ollama Modelfile).
rawboolFalseIf True, no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your API request.
timeoutint120The number of seconds before throwing a timeout error from the Ollama API.
streaming_callbackOptional[Callable[[StreamingChunk], None]]NoneA callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
keep_aliveOptional[Union[float, str]]NoneThe option that controls how long the model will stay loaded into memory following the request. If not set, it will use the default value from the Ollama (five minutes). The value can be set to: a duration string (such as "10m" or "24h"), a number in seconds (such as 3600), any negative number which will keep the model loaded in memory (for example, -1 or "-1m"), or '0' which will unload the model immediately after generating a response.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
promptstrThe prompt to generate a response for.
generation_kwargsOptional[Dict[str, Any]]NoneOptional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
streaming_callbackOptional[Callable[[StreamingChunk], None]]NoneA callback function that is called when a new token is received from the stream.