OllamaGenerator
Generate text using an LLM running on Ollama.
Key Features
- Connects to models served by Ollama, a project for running LLMs locally.
- Uses the quantized GGUF format by default, enabling LLMs on standard machines without GPUs.
- Accepts string prompts and returns string replies.
- Supports streaming responses via a callback function.
- Configurable generation parameters such as
temperature,top_p, andnum_predictviageneration_kwargs.
Configuration
Before using this component, make sure you have a running Ollama instance with the model pulled.
- Drag the
OllamaGeneratorcomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set the model name. The model must already be available in your running Ollama instance. See other pre-built models in Ollama's library.
- Set the
urlto point to your Ollama server (default:http://localhost:11434).
- Go to the Advanced tab to configure
timeout,keep_alive,system_prompt,template,raw, andgeneration_kwargs.
Connections
OllamaGenerator accepts a prompt string as input. Connect its prompt input to the prompt output of PromptBuilder.
It outputs replies as a list of strings and meta as a list of metadata dictionaries. Connect its replies output to DeepsetAnswerBuilder.
Source Code
To check this component's source code, open generator.py in the Haystack Core Integrations repository.
Usage Examples
Basic Configuration
OllamaGenerator:
type: haystack_integrations.components.generators.ollama.generator.OllamaGenerator
init_parameters:
model: llama3
url: http://localhost:11434
generation_kwargs:
temperature: 0.7
num_predict: 1024
raw: false
timeout: 120
Connections
This pipeline uses OllamaGenerator to generate replies to a question.
components:
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
similarity: cosine
top_k: 10
OllamaTextEmbedder:
type: haystack_integrations.components.embedders.ollama.text_embedder.OllamaTextEmbedder
init_parameters:
model: nomic-embed-text
url: http://localhost:11434
generation_kwargs:
timeout: 120
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: "*"
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
If the answer exists in several documents, summarize them.
Ignore documents that don't contain the answer to the question.
Only answer based on the documents provided. Don't make things up.
If no information related to the question can be found in the document, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, for example [3] for Document [3].
Never name the documents, only enter a number in square brackets as a reference.
These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}]:
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
{% endif %}
Question: {{ question }}
Answer:
OllamaGenerator:
type: haystack_integrations.components.generators.ollama.generator.OllamaGenerator
init_parameters:
model: llama3
url: http://localhost:11434
generation_kwargs:
temperature: 0.7
num_predict: 1024
system_prompt:
template:
raw: false
timeout: 120
streaming_callback:
keep_alive:
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
connections:
- sender: OllamaTextEmbedder.embedding
receiver: retriever.query_embedding
- sender: retriever.documents
receiver: prompt_builder.documents
- sender: prompt_builder.prompt
receiver: OllamaGenerator.prompt
- sender: OllamaGenerator.replies
receiver: answer_builder.replies
- sender: retriever.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: answer_builder.prompt
inputs:
query:
- OllamaTextEmbedder.text
- prompt_builder.question
- answer_builder.query
filters:
- retriever.filters
outputs:
documents: retriever.documents
answers: answer_builder.answers
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | The prompt to generate a response for. | |
generation_kwargs | Optional[Dict[str, Any]] | None | Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs. |
streaming_callback | Optional[Callable[[StreamingChunk], None]] | None | A callback function that is called when a new token is received from the stream. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
replies | List[str] | A list of replies generated by the model. | |
meta | List[Dict[str, Any]] | Information about the request, such as token count and model details. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | orca-mini | The name of the model to use. The model should be available in the running Ollama instance. |
url | str | http://localhost:11434 | The URL of a running Ollama instance. |
generation_kwargs | Optional[Dict[str, Any]] | None | Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs. |
system_prompt | Optional[str] | None | Optional system message (overrides what is defined in the Ollama Modelfile). |
template | Optional[str] | None | The full prompt template (overrides what is defined in the Ollama Modelfile). |
raw | bool | False | If True, no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your API request. |
timeout | int | 120 | The number of seconds before throwing a timeout error from the Ollama API. |
streaming_callback | Optional[Callable[[StreamingChunk], None]] | None | A callback function that is called when a new token is received from the stream. |
keep_alive | Optional[Union[float, str]] | None | Controls how long the model will stay loaded into memory following the request. If not set, it will use the default value from Ollama (five minutes). |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | The prompt to generate a response for. | |
generation_kwargs | Optional[Dict[str, Any]] | None | Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs. |
streaming_callback | Optional[Callable[[StreamingChunk], None]] | None | A callback function that is called when a new token is received from the stream. |
Was this page helpful?