Simplify Your Pipelines
Remove unnecessary components from your pipelines by taking advantage of smart connections. Use this guide to learn how to simplify your pipelines.
About This Task
Smart connections let the pipeline automatically merge lists and convert types between components. This means many "glue" components you used before are no longer needed. Your pipelines become shorter, easier to read, and simpler to debug.
| Component Previously Needed | Smart Connection Instead |
|---|---|
DocumentJoiner | Connect all document outputs from sender components directly to one list[Document] input (example receiving components are Ranker, PromptBuilder, DocumentSplitter, DocumentWriter, Embedder, AnswerBuilder, typically to their documents input). |
ListJoiner | Connect all sender components' outputs directly to one list[ChatMessage] input (example receiver is Agent's messages input). |
OutputAdapter | Connect the LLM's replies output directly to the Retriever, Ranker, and other downstream components. |
For a full list of simplified components, see Legacy Components.
For background on how smart connections work, see Smart Connections.
Remove DocumentJoiner
Components now accept multiple lists of documents, which eliminates the need for a DocumentJoiner in most cases. Below, you can find common use cases for DocumentJoiner and how to simplify them.
Hybrid Retrieval Pipelines
If your pipeline uses multiple retrievers (for example, a BM25 retriever and an embedding retriever), you probably have a DocumentJoiner sitting between the retrievers and the next component. With smart connections, you can remove it and connect the retrievers directly to the downstream component.
The pipeline automatically merges the document lists into one before passing them along.
To simplify this pipeline:
- Remove the
DocumentJoinercomponent. - Reconnect
OpenSearchBM25Retriever'sdocumentsoutput toTransformersSimilarityRanker'sdocumentsinput. - Reconnect
OpenSearchEmbeddingRetriever'sdocumentsoutput toTransformersSimilarityRanker'sdocumentsinput.
Before: with DocumentJoiner
components:
OpenSearchBM25Retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20
SentenceTransformersTextEmbedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: intfloat/e5-base-v2
OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20
DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
TransformersSimilarityRanker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
Answer the question based on the provided documents.
Documents:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ question }}
OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4o
AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
connections:
- sender: OpenSearchBM25Retriever.documents
receiver: DocumentJoiner.documents
- sender: SentenceTransformersTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: OpenSearchEmbeddingRetriever.documents
receiver: DocumentJoiner.documents
- sender: DocumentJoiner.documents
receiver: TransformersSimilarityRanker.documents
- sender: TransformersSimilarityRanker.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
receiver: AnswerBuilder.replies
inputs:
query:
- OpenSearchBM25Retriever.query
- SentenceTransformersTextEmbedder.text
- TransformersSimilarityRanker.query
- PromptBuilder.question
- AnswerBuilder.query
outputs:
answers: AnswerBuilder.answers
After: without DocumentJoiner
components:
OpenSearchBM25Retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20
SentenceTransformersTextEmbedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: intfloat/e5-base-v2
OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20
TransformersSimilarityRanker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
Answer the question based on the provided documents.
Documents:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ question }}
OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4o
AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
connections:
- sender: OpenSearchBM25Retriever.documents
receiver: TransformersSimilarityRanker.documents
- sender: SentenceTransformersTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: OpenSearchEmbeddingRetriever.documents
receiver: TransformersSimilarityRanker.documents
- sender: TransformersSimilarityRanker.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
receiver: AnswerBuilder.replies
inputs:
query:
- OpenSearchBM25Retriever.query
- SentenceTransformersTextEmbedder.text
- TransformersSimilarityRanker.query
- PromptBuilder.question
- AnswerBuilder.query
outputs:
answers: AnswerBuilder.answers
Indexes with Multiple File Converters
A common use case is to connect multiple file converters to a DocumentWriter in an index. To simplify such an index, do these steps:
- Remove the
DocumentJoinercomponent that collects documents from converters and sends them toDocumentSplitter. - Reconnect
TextFileToDocument'sdocumentsoutput (the converter for text files) toDocumentSplitter'sdocumentsinput. - Reconnect
PPTXToDocument'sdocumentsoutput toDocumentSplitter'sdocumentsinput. - Reconnect
PDFMinerToDocument'sdocumentsoutput toDocumentSplitter'sdocumentsinput. - Reconnect another
TextFileToDocument'sdocumentsoutput (the converter for Markdown files) toDocumentSplitter'sdocumentsinput. - Reconnect
HTMLToDocument'sdocumentsoutput toDocumentSplitter'sdocumentsinput. - Reconnect
DOCXToDocument'sdocumentsoutput toDocumentSplitter'sdocumentsinput. - Remove the second
DocumentJoinercomponent that collects documents fromCSVToDocument,XLSXToDocument, andDocumentSplitter, and sends them toDeepsetNvidiaDocumentEmbedder. - Reconnect
DocumentSplitter'sdocumentsoutput toDeepsetNvidiaDocumentEmbedder'sdocumentsinput. - Reconnect
CSVToDocument'sdocumentsoutput toDeepsetNvidiaDocumentEmbedder'sdocumentsinput. - Reconnect
XLSXToDocument'sdocumentsoutput toDeepsetNvidiaDocumentEmbedder'sdocumentsinput.
Before: with DocumentJoiner
components:
FileTypeRouter:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
PDFMinerToDocument:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
HTMLToDocument:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
DOCXToDocument:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
PPTXToDocument:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
XLSXToDocument:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
CSVToDocument:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
DocumentSplitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
DeepsetNvidiaDocumentEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE
connections:
- sender: FileTypeRouter.text/plain
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.application/pdf
receiver: PDFMinerToDocument.sources
- sender: FileTypeRouter.text/markdown
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.text/html
receiver: HTMLToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: DOCXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: PPTXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: XLSXToDocument.sources
- sender: FileTypeRouter.text/csv
receiver: CSVToDocument.sources
- sender: TextFileToDocument.documents
receiver: DocumentJoiner.documents
- sender: PDFMinerToDocument.documents
receiver: DocumentJoiner.documents
- sender: TextFileToDocument.documents
receiver: DocumentJoiner.documents
- sender: HTMLToDocument.documents
receiver: DocumentJoiner.documents
- sender: DOCXToDocument.documents
receiver: DocumentJoiner.documents
- sender: PPTXToDocument.documents
receiver: DocumentJoiner.documents
- sender: XLSXToDocument.documents
receiver: DocumentJoiner.documents
- sender: CSVToDocument.documents
receiver: DocumentJoiner.documents
- sender: DocumentJoiner.documents
receiver: DocumentSplitter.documents
- sender: DocumentSplitter.documents
receiver: DocumentJoiner.documents
- sender: XLSXToDocument.documents
receiver: DocumentJoiner.documents
- sender: CSVToDocument.documents
receiver: DocumentJoiner.documents
- sender: DocumentJoiner.documents
receiver: DeepsetNvidiaDocumentEmbedder.documents
- sender: DeepsetNvidiaDocumentEmbedder.documents
receiver: DocumentWriter.documents
inputs:
files:
- FileTypeRouter.sources
max_runs_per_component: 100
metadata: {}
After: without DocumentJoiner
components:
FileTypeRouter:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
PDFMinerToDocument:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
HTMLToDocument:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
DOCXToDocument:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
PPTXToDocument:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
XLSXToDocument:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
CSVToDocument:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
DocumentSplitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
DeepsetNvidiaDocumentEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE
connections:
- sender: FileTypeRouter.text/plain
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.application/pdf
receiver: PDFMinerToDocument.sources
- sender: FileTypeRouter.text/markdown
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.text/html
receiver: HTMLToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: DOCXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: PPTXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: XLSXToDocument.sources
- sender: FileTypeRouter.text/csv
receiver: CSVToDocument.sources
- sender: DeepsetNvidiaDocumentEmbedder.documents
receiver: DocumentWriter.documents
- sender: PPTXToDocument.documents
receiver: DocumentSplitter.documents
- sender: TextFileToDocument.documents
receiver: DocumentSplitter.documents
- sender: PDFMinerToDocument.documents
receiver: DocumentSplitter.documents
- sender: TextFileToDocument.documents
receiver: DocumentSplitter.documents
- sender: HTMLToDocument.documents
receiver: DocumentSplitter.documents
- sender: DOCXToDocument.documents
receiver: DocumentSplitter.documents
inputs:
files:
- FileTypeRouter.sources
max_runs_per_component: 100
metadata: {}
Remove ListJoiner
Components now accept multiple lists of the same type, which means you can get rid of ListJoiner in most cases. Below, you can find common use cases for ListJoiner and how to simplify them.
Joining ChatMessages
If you're joining multiple list[ChatMessage] using a ListJoiner, you can remove it. The pipeline now handles these conversions automatically. A common use case is joining messages from a DeepsetChatHistoryParser with the current user's message to send them to the Agent, like in the RAG Research Agent template. You can now skip ListJoiner and connect the components directly. To do so, follow these steps:
- Remove
ListJoiner. - Connect
DeepsetChatHistoryParser'smessagesoutput directly to the Agent'smessagesinput. - Connect
ChatPromptBuilder'spromptoutput directly to the Agent'smessagesinput.
Before: Joining ChatMessages with ListJoiner
components:
adapter:
init_parameters:
custom_filters: {}
output_type: list[str]
template: '{{ [(messages|last).text] }}'
unsafe: false
type: haystack.components.converters.output_adapter.OutputAdapter
history_parser:
init_parameters: {}
type: deepset_cloud_custom_nodes.parsers.chat_history_parser.DeepsetChatHistoryParser
MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: list[haystack.Document]
custom_filters:
unsafe: true
text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: |-
{% message role="user" %}
{%- if documents|length > 0 -%}
Here are documents provided by the user:
{% for document in documents -%}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{%- endfor -%}
{%- endif -%}
{% endmessage %}
ListJoiner:
type: haystack.components.joiners.list_joiner.ListJoiner
init_parameters:
list_type_: list[haystack.dataclasses.chat_message.ChatMessage]
Agent:
type: haystack.components.agents.agent.Agent
init_parameters:
chat_generator:
init_parameters:
model: gpt-5.2
generation_kwargs:
reasoning:
effort: low
verbosity: low
type: haystack.components.generators.chat.openai_responses.OpenAIResponsesChatGenerator
exit_conditions:
- text
max_agent_steps: 100
raise_on_tool_invocation_failure: false
state_schema:
documents:
type: list[haystack.Document]
streaming_callback: deepset_cloud_custom_nodes.callbacks.streaming.streaming_callback
system_prompt: >-
You are a deep research assistant.
You create comprehensive research reports to answer the user's
questions.
You have one tool to gather data: 'local_search'.
The local_search tool supports hybrid retrieval using keywords, semantic
embeddings, and reranking.
Formulate natural language search queries that describe the full intent
of the question.
Use multiple varied searches to fully cover the topic.
Only information retrieved from local_search may be used to answer the
question.
If the question cannot be answered using this knowledge source,
explicitly state that it is not answerable and briefly explain why.
Do not use external knowledge, assumptions, or speculation.
When you use information from the local search, cite the source with the
documents reference number in square brackets where you use the
information (e.g. [5]).
This is IMPORTANT:
- Only use numbered citations for the local search results.
- Do NOT add a References section, cite directly in the text where you
use the information.
- For internal knowledge "some information" [3] as (taken from <document
reference="3">).
- Format responses using markdown.
tools:
- type: haystack.tools.pipeline_tool.PipelineTool
data:
name: local_search
description: >-
Search the company's internal knowledge repository using hybrid
retrieval.
The search supports natural language queries, keyword matching,
semantic embeddings, and cross-encoder reranking.
Use descriptive, question-like queries to capture intent and
retrieve the most relevant documents.
input_mapping:
query:
- retriever.query
- ranker.query
documents:
- builder.existing_documents
output_mapping:
builder.prompt: formatted_docs
meta_field_grouping_ranker.documents: documents
inputs_from_state:
documents: documents
outputs_to_state:
documents:
source: documents
outputs_to_string:
source: formatted_docs
parameters:
is_pipeline_async: false
pipeline:
components:
builder:
init_parameters:
required_variables:
- existing_documents
- docs
template: |-
{%- if existing_documents is not none -%}
{%- set existing_doc_len = existing_documents|length -%}
{%- else -%}
{%- set existing_doc_len = 0 -%}
{%- endif -%}
{%- for doc in docs %}
<document reference="{{loop.index + existing_doc_len}}">
{{ doc.content }}
</document>
{% endfor -%}
variables:
type: haystack.components.builders.prompt_builder.PromptBuilder
retriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
top_k: 20
fuzziness: 0
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
connections:
- sender: retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: builder.docs
max_runs_per_component: 100
metadata: {}
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0] }}"
output_type: str
custom_filters:
unsafe: false
connections:
- sender: MultiFileConverter.documents
receiver: ChatPromptBuilder.documents
- sender: ChatPromptBuilder.prompt
receiver: ListJoiner.values
- sender: history_parser.messages
receiver: ListJoiner.values
- sender: ListJoiner.values
receiver: Agent.messages
- sender: Agent.documents
receiver: DeepsetAnswerBuilder.documents
- sender: Agent.messages
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.replies
inputs:
query:
- DeepsetAnswerBuilder.query
- history_parser.history_and_query
files:
- MultiFileConverter.sources
max_runs_per_component: 100
metadata: {}
outputs:
answers: DeepsetAnswerBuilder.answers
documents: Agent.documents
pipeline_output_type: chat
After: Joining ChatMessages without ListJoiner
components:
adapter:
init_parameters:
custom_filters: {}
output_type: list[str]
template: '{{ [(messages|last).text] }}'
unsafe: false
type: haystack.components.converters.output_adapter.OutputAdapter
history_parser:
init_parameters: {}
type: deepset_cloud_custom_nodes.parsers.chat_history_parser.DeepsetChatHistoryParser
MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: list[haystack.Document]
custom_filters:
unsafe: true
text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: |-
{% message role="user" %}
{%- if documents|length > 0 -%}
Here are documents provided by the user:
{% for document in documents -%}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{%- endfor -%}
{%- endif -%}
{% endmessage %}
Agent:
type: haystack.components.agents.agent.Agent
init_parameters:
chat_generator:
init_parameters:
model: gpt-5.2
generation_kwargs:
reasoning:
effort: low
verbosity: low
type: haystack.components.generators.chat.openai_responses.OpenAIResponsesChatGenerator
exit_conditions:
- text
max_agent_steps: 100
raise_on_tool_invocation_failure: false
state_schema:
documents:
type: list[haystack.Document]
streaming_callback: deepset_cloud_custom_nodes.callbacks.streaming.streaming_callback
system_prompt: >-
You are a deep research assistant.
You create comprehensive research reports to answer the user's
questions.
You have one tool to gather data: 'local_search'.
The local_search tool supports hybrid retrieval using keywords, semantic
embeddings, and reranking.
Formulate natural language search queries that describe the full intent
of the question.
Use multiple varied searches to fully cover the topic.
Only information retrieved from local_search may be used to answer the
question.
If the question cannot be answered using this knowledge source,
explicitly state that it is not answerable and briefly explain why.
Do not use external knowledge, assumptions, or speculation.
When you use information from the local search, cite the source with the
documents reference number in square brackets where you use the
information (e.g. [5]).
This is IMPORTANT:
- Only use numbered citations for the local search results.
- Do NOT add a References section, cite directly in the text where you
use the information.
- For internal knowledge "some information" [3] as (taken from <document
reference="3">).
- Format responses using markdown.
tools:
- type: haystack.tools.pipeline_tool.PipelineTool
data:
name: local_search
description: >-
Search the company's internal knowledge repository using hybrid
retrieval.
The search supports natural language queries, keyword matching,
semantic embeddings, and cross-encoder reranking.
Use descriptive, question-like queries to capture intent and
retrieve the most relevant documents.
input_mapping:
query:
- retriever.query
- ranker.query
documents:
- builder.existing_documents
output_mapping:
builder.prompt: formatted_docs
meta_field_grouping_ranker.documents: documents
inputs_from_state:
documents: documents
outputs_to_state:
documents:
source: documents
outputs_to_string:
source: formatted_docs
parameters:
is_pipeline_async: false
pipeline:
components:
builder:
init_parameters:
required_variables:
- existing_documents
- docs
template: |-
{%- if existing_documents is not none -%}
{%- set existing_doc_len = existing_documents|length -%}
{%- else -%}
{%- set existing_doc_len = 0 -%}
{%- endif -%}
{%- for doc in docs %}
<document reference="{{loop.index + existing_doc_len}}">
{{ doc.content }}
</document>
{% endfor -%}
variables:
type: haystack.components.builders.prompt_builder.PromptBuilder
retriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
top_k: 20
fuzziness: 0
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
connections:
- sender: retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: builder.docs
max_runs_per_component: 100
metadata: {}
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0] }}"
output_type: str
custom_filters:
unsafe: false
connections:
- sender: MultiFileConverter.documents
receiver: ChatPromptBuilder.documents
- sender: Agent.documents
receiver: DeepsetAnswerBuilder.documents
- sender: Agent.messages
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.replies
- sender: history_parser.messages
receiver: Agent.messages
- sender: ChatPromptBuilder.prompt
receiver: Agent.messages
inputs:
query:
- DeepsetAnswerBuilder.query
- history_parser.history_and_query
files:
- MultiFileConverter.sources
max_runs_per_component: 100
metadata: {}
outputs:
answers: DeepsetAnswerBuilder.answers
documents: Agent.documents
pipeline_output_type: chat
Remove OutputAdapter for Type Conversions
RAG Chat with LLM sending input to a Retriever and a Ranker
In chat pipelines, where the first LLM reformulates the query to be used for retrieval augmented generation, you can remove the OutputAdapter and connect the ChatGenerator directly to the Retriever or Ranker.
This is an example of a RAG chat pipeline with an OutputAdapter and a DocumentJoiner that you can simplify. Follow these steps:
- Remove
OutputAdapter. - Connect the
repliesoutput of the firstOpenAIGeneratorto the following components' inputs:
OpenSearchHybridRetriever'squeryinput.DeepsetNvidiaRanker'squeryinput.PromptBuilder'squestioninput.DeepsetAnswerBuilder'squeryinput.
- Remove
DocumentJoiner. - Connect
MultiFileConverter'sdocumentsoutput to the following components' inputs:- The second
PromptBuilder'sdocumentsinput. DeepsetAnswerBuilder'sdocumentsinput.Output'sdocumentsinput.
- The second
- Connect
DeepsetNvidiaRanker'sdocumentsoutput to the following components' inputs:- The second
PromptBuilder'sdocumentsinput. DeepsetAnswerBuilder'sdocumentsinput.Output'sdocumentsinput.
- The second
Before: LLM connected to a Retriever through an OutputAdapter
components:
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: "*"
template: |-
You are part of a chatbot.
You receive a question (Current Question) and a chat history.
Use the context from the chat history and reformulate the question so that it is suitable for retrieval augmented generation.
If X is followed by Y, only ask for Y and do not repeat X again.
If the question does not require any context from the chat history, output it unedited.
Don't make questions too long, but short and precise.
Stay as close as possible to the current question.
Only output the new question, nothing else!
{{ question }}
New question:
OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0] }}"
output_type: str
OpenSearchHybridRetriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
DeepsetNvidiaRanker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: '*'
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
Ignore typing errors in the question.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
Just output the structured, informative and precise answer and nothing else.
If the documents can't answer the question, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.
These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}
Question: {{ question }}
Answer:
OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
weights:
top_k:
sort_by_score: true
MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: List[haystack.Document]
custom_filters:
unsafe: true
text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents
connections:
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: OpenSearchHybridRetriever.query
- sender: OutputAdapter.output
receiver: DeepsetNvidiaRanker.query
- sender: OutputAdapter.output
receiver: PromptBuilder.question
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.query
- sender: retriever.documents
receiver: DeepsetNvidiaRanker.documents
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: PromptBuilder.prompt
receiver: DeepsetAnswerBuilder.prompt
- sender: OpenAIGenerator.replies
receiver: DeepsetAnswerBuilder.replies
- sender: MultiFileConverter.documents
receiver: DocumentJoiner.documents
- sender: DeepsetNvidiaRanker.documents
receiver: attachments_joiner.documents
- sender: DocumentJoiner.documents
receiver: DeepsetAnswerBuilder.documents
- sender: DocumentJoiner.documents
receiver: PromptBuilder.documents
inputs:
query:
- PromptBuilder.question
filters:
- OpenSearchHybridRetriever.filters_bm25
- OpenSearchHybridRetriever.filters_embedding
files:
- MultiFileConverter.sources
outputs:
documents: DocumentJoiner.documents
answers: DeepsetAnswerBuilder.answers
max_runs_per_component: 100
metadata: {}
After: LLM connected directly to a Retriever and a Ranker
This is a simplified version of the pipeline above. You can remove the DocumentJoiner and OutputAdapterand connect the components directly.
components:
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: "*"
template: |-
You are part of a chatbot.
You receive a question (Current Question) and a chat history.
Use the context from the chat history and reformulate the question so that it is suitable for retrieval augmented generation.
If X is followed by Y, only ask for Y and do not repeat X again.
If the question does not require any context from the chat history, output it unedited.
Don't make questions too long, but short and precise.
Stay as close as possible to the current question.
Only output the new question, nothing else!
{{ question }}
New question:
OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low
OpenSearchHybridRetriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
DeepsetNvidiaRanker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: '*'
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
Ignore typing errors in the question.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
Just output the structured, informative and precise answer and nothing else.
If the documents can't answer the question, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.
These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}
Question: {{ question }}
Answer:
OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: List[haystack.Document]
custom_filters:
unsafe: true
text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents
connections:
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenSearchHybridRetriever.documents
receiver: DeepsetNvidiaRanker.documents
- sender: PromptBuilder.prompt
receiver: DeepsetAnswerBuilder.prompt
- sender: OpenAIGenerator.replies
receiver: DeepsetAnswerBuilder.replies
- sender: DeepsetNvidiaRanker.documents
receiver: PromptBuilder.documents
- sender: DeepsetNvidiaRanker.documents
receiver: DeepsetAnswerBuilder.documents
- sender: OpenAIGenerator.replies
receiver: DeepsetNvidiaRanker.query
- sender: OpenAIGenerator.replies
receiver: DeepsetAnswerBuilder.query
- sender: MultiFileConverter.documents
receiver: DeepsetAnswerBuilder.documents
- sender: MultiFileConverter.documents
receiver: PromptBuilder.documents
- sender: MultiFileConverter.documents
receiver: Output.documents
inputs:
query:
- PromptBuilder.question
filters:
- OpenSearchHybridRetriever.filters_bm25
- OpenSearchHybridRetriever.filters_embedding
files:
- MultiFileConverter.sources
outputs:
answers: DeepsetAnswerBuilder.answers
max_runs_per_component: 100
metadata: {}
When You Still Need OutputAdapter
You still need OutputAdapter when:
- You're converting between types that smart connections don't support (anything other than
stringandChatMessage). - You need explicit control over formatting, ordering, or extracting specific fields from the output.
Was this page helpful?