OllamaGenerator

Generate text using an LLM running on Ollama.

Basic Information

Type: haystack_integrations.components.generators.ollama.generator.OllamaGenerator
Components it can connect with:
- PromptBuilder: OllamaGenerator receives a prompt from PromptBuilder.
- DeepsetAnswerBuilder: OllamaGenerator sends the generated replies to DeepsetAnswerBuilder.

Inputs

Parameter	Type	Default	Description
prompt	str		The prompt to generate a response for.
generation_kwargs	Optional[Dict[str, Any]]	None	Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
streaming_callback	Optional[Callable[[StreamingChunk], None]]	None	A callback function that is called when a new token is received from the stream.

Outputs

Parameter	Type	Default	Description
replies	List[str]		A list of replies generated by the model.
meta	List[Dict[str, Any]]		Information about the request, such as token count and model details.

Overview

OllamaGenerator provides an interface for generating text using LLMs running on Ollama.

Ollama is a project focused on running LLMs locally. Internally, it uses the quantized GGUF format by default. This means you can run LLMs on standard machines (even without GPUs) without complex installation procedures.

You can configure how the model generates text by passing additional arguments through generation_kwargs. For example, you can set temperature, top_p, and num_predict.

Compatible Models

The default model is orca-mini. See other pre-built models in Ollama's library. To load your own custom model, follow the instructions from Ollama.

Prerequisites

You need a running Ollama instance with the model pulled. The component uses http://localhost:11434 as the default URL.

Usage Example

This pipeline uses OllamaGenerator to generate replies to a question. It uses DeepsetAnswerBuilder to build the answers with references.

components:
  retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: ''
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
          similarity: cosine
      top_k: 10
  OllamaTextEmbedder:
    type: haystack_integrations.components.embedders.ollama.text_embedder.OllamaTextEmbedder
    init_parameters:
      model: nomic-embed-text
      url: http://localhost:11434
      generation_kwargs:
      timeout: 120
  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      required_variables: "*"
      template: |-
        You are a technical expert.
        You answer questions truthfully based on provided documents.
        If the answer exists in several documents, summarize them.
        Ignore documents that don't contain the answer to the question.
        Only answer based on the documents provided. Don't make things up.
        If no information related to the question can be found in the document, say so.
        Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, for example [3] for Document [3].
        Never name the documents, only enter a number in square brackets as a reference.

        These are the documents:
        {%- if documents|length > 0 %}
        {%- for document in documents %}
        Document [{{ loop.index }}]:
        {{ document.content }}
        {% endfor -%}
        {%- else %}
        No relevant documents found.
        {% endif %}

        Question: {{ question }}
        Answer:
  OllamaGenerator:
    type: haystack_integrations.components.generators.ollama.generator.OllamaGenerator
    init_parameters:
      model: llama3
      url: http://localhost:11434
      generation_kwargs:
        temperature: 0.7
        num_predict: 1024
      system_prompt:
      template:
      raw: false
      timeout: 120
      streaming_callback:
      keep_alive:
  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

connections:
- sender: OllamaTextEmbedder.embedding
  receiver: retriever.query_embedding
- sender: retriever.documents
  receiver: prompt_builder.documents
- sender: prompt_builder.prompt
  receiver: OllamaGenerator.prompt
- sender: OllamaGenerator.replies
  receiver: answer_builder.replies
- sender: retriever.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: answer_builder.prompt

inputs:
  query:
  - OllamaTextEmbedder.text
  - prompt_builder.question
  - answer_builder.query
  filters:
  - retriever.filters

outputs:
  documents: retriever.documents
  answers: answer_builder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model	str	orca-mini	The name of the model to use. The model should be available in the running Ollama instance.
url	str	http://localhost:11434	The URL of a running Ollama instance.
generation_kwargs	Optional[Dict[str, Any]]	None	Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
system_prompt	Optional[str]	None	Optional system message (overrides what is defined in the Ollama Modelfile).
template	Optional[str]	None	The full prompt template (overrides what is defined in the Ollama Modelfile).
raw	bool	False	If True, no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your API request.
timeout	int	120	The number of seconds before throwing a timeout error from the Ollama API.
streaming_callback	Optional[Callable[[StreamingChunk], None]]	None	A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
keep_alive	Optional[Union[float, str]]	None	The option that controls how long the model will stay loaded into memory following the request. If not set, it will use the default value from the Ollama (5 minutes). The value can be set to: - a duration string (such as "10m" or "24h") - a number in seconds (such as 3600) - any negative number which will keep the model loaded in memory (e.g. -1 or "-1m") - '0' which will unload the model immediately after generating a response.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
prompt	str		The prompt to generate a response for.
generation_kwargs	Optional[Dict[str, Any]]	None	Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, and others. See the available arguments in Ollama docs.
streaming_callback	Optional[Callable[[StreamingChunk], None]]	None	A callback function that is called when a new token is received from the stream.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Compatible Models​

Prerequisites​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​