Common Component Combinations
Let's look at examples of components that often go together to understand how they work
PromptBuilder and Generator
Generators are components designed to interact with large language models (LLMs). Any LLM app requires a Generator component specific to the model provider used.
Generator needs a prompt as input, which is created using PromptBuilder
. With PromptBuilder
, you can define your prompt as a Jinja2 template. PromptBuilder processes the template, fills in the variables, and sends it to the Generator.
To connect these components, link the prompt
output of PromptBuilder
to the prompt
input of the Generator
, as shown below:
Example YAML configuration:
components:
qa_prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
Ignore typing errors in the question.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
Just output the structured, informative and precise answer and nothing else.
If the documents can't answer the question, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document[3].
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.
These are the documents:
{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}
Question: {{ question }}
Answer:
qa_llm:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key: {"type": "env_var", "env_vars": ["OPENAI_API_KEY"], "strict": False}
model: "gpt-4o"
generation_kwargs:
max_tokens: 650
temperature: 0.0
seed: 0
connections:
- sender: qa_prompt_builder.prompt
receiver: qa_llm.prompt
Generator and DeepsetAnswerBuilder
The Generator’s output is replies
, a list of strings containing all the answers the LLM generates. However, pipelines require outputs in the form of Answer
objects or its subclasses, such as GeneratedAnswer
. Since the pipeline’s output is the same as the output of its last component, you need a component that converts replies
into the GeneratedAnswer
format. This is why Generators are always paired with a DeepsetAnswerBuilder
.
DeepsetAnswerBuilder
processes the Generator’s replies and transforms them into a list of GeneratedAnswer
objects. It can also include documents in the answers, providing references to support the generated responses. This feature is particularly useful when you need answers that are backed by reliable sources.
To connect these two components, you simply link Generator’s replies
output with DeepsetAnswerBuilder
’s replies
input. Additionally, DeepsetAnswerBuilder
requires the query as an input to include it in the GeneratedAnswer
. Ensure the query is properly connected to the DeepsetAnswerBuilder
to complete the configuration. DeepsetAnswerBuilder
also accepts the prompt as input. If it receives the prompt, it adds it to the GeneratedAnswer
's metadata. This is useful if you need a prompt as part of the API response from deepset Cloud. Here’s how to connect the components:
Tip: DeepsetAnswerBuilder
is available in the Augmenters group in the components library.
Embedders
Embedders are a group of components that turn text or documents into vectors (embeddings) using models trained to do that. Embedders are specific to the model provider, with at least two embedders available for each:
- Document Embedder
- Text Embedder
Document Embedders are used in indexing pipelines to embed documents before they’re written into the document store. In most cases, this means a DocumentEmbedder
is connected to the DocumentWriter
:
The connection is simple - you link DocumentEmbedder
’s documents
output with DocumentWriter
’s documents
input.
Text Embedders are used in query pipelines to embed the query text and pass it on to the next component, typically a Retriever.
The Query
component’s text
output is connected to the TextEmbedder
’s text
input. The TextEmbedder
then generates embeddings, and its embedding
output is linked to the query_embedding
input of the Retriever.
Keep in mind that the DocumentEmbedder
in your indexing pipeline and the TextEmbedder
in your query pipeline must use the same model. This ensures the embeddings are compatible for accurate retrieval.
Updated about 9 hours ago