Skip to main content

Simplify Your Pipelines

Remove unnecessary components from your pipelines by taking advantage of smart connections. Use this guide to learn how to simplify your pipelines.


About This Task

Smart connections let the pipeline automatically merge lists and convert types between components. This means many "glue" components you used before are no longer needed. Your pipelines become shorter, easier to read, and simpler to debug.

Component Previously NeededSmart Connection Instead
DocumentJoinerConnect all document outputs from sender components directly to one list[Document] input (example receiving components are Ranker, PromptBuilder, DocumentSplitter, DocumentWriter, Embedder, AnswerBuilder, typically to their documents input).
ListJoinerConnect all sender components' outputs directly to one list[ChatMessage] input (example receiver is Agent's messages input).
OutputAdapterConnect the LLM's replies output directly to the Retriever, Ranker, and other downstream components.

For a full list of simplified components, see Legacy Components.

For background on how smart connections work, see Smart Connections.

Remove DocumentJoiner

Components now accept multiple lists of documents, which eliminates the need for a DocumentJoiner in most cases. Below, you can find common use cases for DocumentJoiner and how to simplify them.

Hybrid Retrieval Pipelines

If your pipeline uses multiple retrievers (for example, a BM25 retriever and an embedding retriever), you probably have a DocumentJoiner sitting between the retrievers and the next component. With smart connections, you can remove it and connect the retrievers directly to the downstream component.

The pipeline automatically merges the document lists into one before passing them along.

To simplify this pipeline:

  1. Remove the DocumentJoiner component.
  2. Reconnect OpenSearchBM25Retriever's documents output to TransformersSimilarityRanker's documents input.
  3. Reconnect OpenSearchEmbeddingRetriever's documents output to TransformersSimilarityRanker's documents input.
Before: with DocumentJoiner
components:
OpenSearchBM25Retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20

SentenceTransformersTextEmbedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: intfloat/e5-base-v2

OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20

DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

TransformersSimilarityRanker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
Answer the question based on the provided documents.
Documents:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ question }}

OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4o

AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder

connections:
- sender: OpenSearchBM25Retriever.documents
receiver: DocumentJoiner.documents
- sender: SentenceTransformersTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: OpenSearchEmbeddingRetriever.documents
receiver: DocumentJoiner.documents
- sender: DocumentJoiner.documents
receiver: TransformersSimilarityRanker.documents
- sender: TransformersSimilarityRanker.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
receiver: AnswerBuilder.replies

inputs:
query:
- OpenSearchBM25Retriever.query
- SentenceTransformersTextEmbedder.text
- TransformersSimilarityRanker.query
- PromptBuilder.question
- AnswerBuilder.query

outputs:
answers: AnswerBuilder.answers
After: without DocumentJoiner
components:
OpenSearchBM25Retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20

SentenceTransformersTextEmbedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: intfloat/e5-base-v2

OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
embedding_dim: 768
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
top_k: 20

TransformersSimilarityRanker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
Answer the question based on the provided documents.
Documents:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ question }}

OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4o

AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder

connections:
- sender: OpenSearchBM25Retriever.documents
receiver: TransformersSimilarityRanker.documents
- sender: SentenceTransformersTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: OpenSearchEmbeddingRetriever.documents
receiver: TransformersSimilarityRanker.documents
- sender: TransformersSimilarityRanker.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
receiver: AnswerBuilder.replies

inputs:
query:
- OpenSearchBM25Retriever.query
- SentenceTransformersTextEmbedder.text
- TransformersSimilarityRanker.query
- PromptBuilder.question
- AnswerBuilder.query

outputs:
answers: AnswerBuilder.answers

Indexes with Multiple File Converters

A common use case is to connect multiple file converters to a DocumentWriter in an index. To simplify such an index, do these steps:

  1. Remove the DocumentJoiner component that collects documents from converters and sends them to DocumentSplitter.
  2. Reconnect TextFileToDocument's documents output (the converter for text files) to DocumentSplitter's documents input.
  3. Reconnect PPTXToDocument's documents output to DocumentSplitter's documents input.
  4. Reconnect PDFMinerToDocument's documents output to DocumentSplitter's documents input.
  5. Reconnect another TextFileToDocument's documents output (the converter for Markdown files) to DocumentSplitter's documents input.
  6. Reconnect HTMLToDocument's documents output to DocumentSplitter's documents input.
  7. Reconnect DOCXToDocument's documents output to DocumentSplitter's documents input.
  8. Remove the second DocumentJoiner component that collects documents from CSVToDocument, XLSXToDocument, and DocumentSplitter, and sends them to DeepsetNvidiaDocumentEmbedder.
  9. Reconnect DocumentSplitter's documents output to DeepsetNvidiaDocumentEmbedder's documents input.
  10. Reconnect CSVToDocument's documents output to DeepsetNvidiaDocumentEmbedder's documents input.
  11. Reconnect XLSXToDocument's documents output to DeepsetNvidiaDocumentEmbedder's documents input.
Before: with DocumentJoiner
components:
FileTypeRouter:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv

TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

PDFMinerToDocument:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false

TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

HTMLToDocument:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true

DOCXToDocument:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown

PPTXToDocument:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}

XLSXToDocument:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}

CSVToDocument:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8

DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false

DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false

DocumentSplitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en

DeepsetNvidiaDocumentEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

connections:
- sender: FileTypeRouter.text/plain
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.application/pdf
receiver: PDFMinerToDocument.sources
- sender: FileTypeRouter.text/markdown
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.text/html
receiver: HTMLToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: DOCXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: PPTXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: XLSXToDocument.sources
- sender: FileTypeRouter.text/csv
receiver: CSVToDocument.sources
- sender: TextFileToDocument.documents
receiver: DocumentJoiner.documents
- sender: PDFMinerToDocument.documents
receiver: DocumentJoiner.documents
- sender: TextFileToDocument.documents
receiver: DocumentJoiner.documents
- sender: HTMLToDocument.documents
receiver: DocumentJoiner.documents
- sender: DOCXToDocument.documents
receiver: DocumentJoiner.documents
- sender: PPTXToDocument.documents
receiver: DocumentJoiner.documents
- sender: XLSXToDocument.documents
receiver: DocumentJoiner.documents
- sender: CSVToDocument.documents
receiver: DocumentJoiner.documents
- sender: DocumentJoiner.documents
receiver: DocumentSplitter.documents
- sender: DocumentSplitter.documents
receiver: DocumentJoiner.documents
- sender: XLSXToDocument.documents
receiver: DocumentJoiner.documents
- sender: CSVToDocument.documents
receiver: DocumentJoiner.documents
- sender: DocumentJoiner.documents
receiver: DeepsetNvidiaDocumentEmbedder.documents
- sender: DeepsetNvidiaDocumentEmbedder.documents
receiver: DocumentWriter.documents

inputs:
files:
- FileTypeRouter.sources

max_runs_per_component: 100

metadata: {}


After: without DocumentJoiner
components:
FileTypeRouter:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv

TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

PDFMinerToDocument:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false

TextFileToDocument:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

HTMLToDocument:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true

DOCXToDocument:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown

PPTXToDocument:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}

XLSXToDocument:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}

CSVToDocument:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8

DocumentSplitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en

DeepsetNvidiaDocumentEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

connections:
- sender: FileTypeRouter.text/plain
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.application/pdf
receiver: PDFMinerToDocument.sources
- sender: FileTypeRouter.text/markdown
receiver: TextFileToDocument.sources
- sender: FileTypeRouter.text/html
receiver: HTMLToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: DOCXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: PPTXToDocument.sources
- sender: FileTypeRouter.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: XLSXToDocument.sources
- sender: FileTypeRouter.text/csv
receiver: CSVToDocument.sources
- sender: DeepsetNvidiaDocumentEmbedder.documents
receiver: DocumentWriter.documents
- sender: PPTXToDocument.documents
receiver: DocumentSplitter.documents
- sender: TextFileToDocument.documents
receiver: DocumentSplitter.documents
- sender: PDFMinerToDocument.documents
receiver: DocumentSplitter.documents
- sender: TextFileToDocument.documents
receiver: DocumentSplitter.documents
- sender: HTMLToDocument.documents
receiver: DocumentSplitter.documents
- sender: DOCXToDocument.documents
receiver: DocumentSplitter.documents

inputs:
files:
- FileTypeRouter.sources

max_runs_per_component: 100

metadata: {}

Remove ListJoiner

Components now accept multiple lists of the same type, which means you can get rid of ListJoiner in most cases. Below, you can find common use cases for ListJoiner and how to simplify them.

Joining ChatMessages

If you're joining multiple list[ChatMessage] using a ListJoiner, you can remove it. The pipeline now handles these conversions automatically. A common use case is joining messages from a DeepsetChatHistoryParser with the current user's message to send them to the Agent, like in the RAG Research Agent template. You can now skip ListJoiner and connect the components directly. To do so, follow these steps:

  1. Remove ListJoiner.
  2. Connect DeepsetChatHistoryParser's messages output directly to the Agent's messages input.
  3. Connect ChatPromptBuilder's prompt output directly to the Agent's messages input.
Before: Joining ChatMessages with ListJoiner
components:
adapter:
init_parameters:
custom_filters: {}
output_type: list[str]
template: '{{ [(messages|last).text] }}'
unsafe: false
type: haystack.components.converters.output_adapter.OutputAdapter

history_parser:
init_parameters: {}
type: deepset_cloud_custom_nodes.parsers.chat_history_parser.DeepsetChatHistoryParser
MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: list[haystack.Document]
custom_filters:
unsafe: true
text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: |-
{% message role="user" %}
{%- if documents|length > 0 -%}
Here are documents provided by the user:
{% for document in documents -%}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{%- endfor -%}
{%- endif -%}
{% endmessage %}
ListJoiner:
type: haystack.components.joiners.list_joiner.ListJoiner
init_parameters:
list_type_: list[haystack.dataclasses.chat_message.ChatMessage]
Agent:
type: haystack.components.agents.agent.Agent
init_parameters:
chat_generator:
init_parameters:
model: gpt-5.2
generation_kwargs:
reasoning:
effort: low
verbosity: low
type: haystack.components.generators.chat.openai_responses.OpenAIResponsesChatGenerator
exit_conditions:
- text
max_agent_steps: 100
raise_on_tool_invocation_failure: false
state_schema:
documents:
type: list[haystack.Document]
streaming_callback: deepset_cloud_custom_nodes.callbacks.streaming.streaming_callback
system_prompt: >-
You are a deep research assistant.

You create comprehensive research reports to answer the user's
questions.

You have one tool to gather data: 'local_search'.


The local_search tool supports hybrid retrieval using keywords, semantic
embeddings, and reranking.

Formulate natural language search queries that describe the full intent
of the question.

Use multiple varied searches to fully cover the topic.


Only information retrieved from local_search may be used to answer the
question.

If the question cannot be answered using this knowledge source,
explicitly state that it is not answerable and briefly explain why.

Do not use external knowledge, assumptions, or speculation.


When you use information from the local search, cite the source with the
documents reference number in square brackets where you use the
information (e.g. [5]).

This is IMPORTANT:

- Only use numbered citations for the local search results.

- Do NOT add a References section, cite directly in the text where you
use the information.

- For internal knowledge "some information" [3] as (taken from <document
reference="3">).

- Format responses using markdown.
tools:
- type: haystack.tools.pipeline_tool.PipelineTool
data:
name: local_search
description: >-
Search the company's internal knowledge repository using hybrid
retrieval.

The search supports natural language queries, keyword matching,
semantic embeddings, and cross-encoder reranking.

Use descriptive, question-like queries to capture intent and
retrieve the most relevant documents.
input_mapping:
query:
- retriever.query
- ranker.query
documents:
- builder.existing_documents
output_mapping:
builder.prompt: formatted_docs
meta_field_grouping_ranker.documents: documents
inputs_from_state:
documents: documents
outputs_to_state:
documents:
source: documents
outputs_to_string:
source: formatted_docs
parameters:
is_pipeline_async: false
pipeline:
components:
builder:
init_parameters:
required_variables:
- existing_documents
- docs
template: |-
{%- if existing_documents is not none -%}
{%- set existing_doc_len = existing_documents|length -%}
{%- else -%}
{%- set existing_doc_len = 0 -%}
{%- endif -%}
{%- for doc in docs %}
<document reference="{{loop.index + existing_doc_len}}">
{{ doc.content }}
</document>
{% endfor -%}
variables:
type: haystack.components.builders.prompt_builder.PromptBuilder
retriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
top_k: 20
fuzziness: 0
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
connections:
- sender: retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: builder.docs
max_runs_per_component: 100
metadata: {}
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0] }}"
output_type: str
custom_filters:
unsafe: false

connections:
- sender: MultiFileConverter.documents
receiver: ChatPromptBuilder.documents
- sender: ChatPromptBuilder.prompt
receiver: ListJoiner.values
- sender: history_parser.messages
receiver: ListJoiner.values
- sender: ListJoiner.values
receiver: Agent.messages
- sender: Agent.documents
receiver: DeepsetAnswerBuilder.documents
- sender: Agent.messages
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.replies

inputs:
query:
- DeepsetAnswerBuilder.query
- history_parser.history_and_query
files:
- MultiFileConverter.sources

max_runs_per_component: 100

metadata: {}

outputs:
answers: DeepsetAnswerBuilder.answers
documents: Agent.documents

pipeline_output_type: chat


After: Joining ChatMessages without ListJoiner
components:
adapter:
init_parameters:
custom_filters: {}
output_type: list[str]
template: '{{ [(messages|last).text] }}'
unsafe: false
type: haystack.components.converters.output_adapter.OutputAdapter

history_parser:
init_parameters: {}
type: deepset_cloud_custom_nodes.parsers.chat_history_parser.DeepsetChatHistoryParser
MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true
docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown
pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}
csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: list[haystack.Document]
custom_filters:
unsafe: true
text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: |-
{% message role="user" %}
{%- if documents|length > 0 -%}
Here are documents provided by the user:
{% for document in documents -%}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{%- endfor -%}
{%- endif -%}
{% endmessage %}
Agent:
type: haystack.components.agents.agent.Agent
init_parameters:
chat_generator:
init_parameters:
model: gpt-5.2
generation_kwargs:
reasoning:
effort: low
verbosity: low
type: haystack.components.generators.chat.openai_responses.OpenAIResponsesChatGenerator
exit_conditions:
- text
max_agent_steps: 100
raise_on_tool_invocation_failure: false
state_schema:
documents:
type: list[haystack.Document]
streaming_callback: deepset_cloud_custom_nodes.callbacks.streaming.streaming_callback
system_prompt: >-
You are a deep research assistant.

You create comprehensive research reports to answer the user's
questions.

You have one tool to gather data: 'local_search'.


The local_search tool supports hybrid retrieval using keywords, semantic
embeddings, and reranking.

Formulate natural language search queries that describe the full intent
of the question.

Use multiple varied searches to fully cover the topic.


Only information retrieved from local_search may be used to answer the
question.

If the question cannot be answered using this knowledge source,
explicitly state that it is not answerable and briefly explain why.

Do not use external knowledge, assumptions, or speculation.


When you use information from the local search, cite the source with the
documents reference number in square brackets where you use the
information (e.g. [5]).

This is IMPORTANT:

- Only use numbered citations for the local search results.

- Do NOT add a References section, cite directly in the text where you
use the information.

- For internal knowledge "some information" [3] as (taken from <document
reference="3">).

- Format responses using markdown.
tools:
- type: haystack.tools.pipeline_tool.PipelineTool
data:
name: local_search
description: >-
Search the company's internal knowledge repository using hybrid
retrieval.

The search supports natural language queries, keyword matching,
semantic embeddings, and cross-encoder reranking.

Use descriptive, question-like queries to capture intent and
retrieve the most relevant documents.
input_mapping:
query:
- retriever.query
- ranker.query
documents:
- builder.existing_documents
output_mapping:
builder.prompt: formatted_docs
meta_field_grouping_ranker.documents: documents
inputs_from_state:
documents: documents
outputs_to_state:
documents:
source: documents
outputs_to_string:
source: formatted_docs
parameters:
is_pipeline_async: false
pipeline:
components:
builder:
init_parameters:
required_variables:
- existing_documents
- docs
template: |-
{%- if existing_documents is not none -%}
{%- set existing_doc_len = existing_documents|length -%}
{%- else -%}
{%- set existing_doc_len = 0 -%}
{%- endif -%}
{%- for doc in docs %}
<document reference="{{loop.index + existing_doc_len}}">
{{ doc.content }}
</document>
{% endfor -%}
variables:
type: haystack.components.builders.prompt_builder.PromptBuilder
retriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
top_k: 20
fuzziness: 0
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
connections:
- sender: retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: builder.docs
max_runs_per_component: 100
metadata: {}
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0] }}"
output_type: str
custom_filters:
unsafe: false

connections:
- sender: MultiFileConverter.documents
receiver: ChatPromptBuilder.documents
- sender: Agent.documents
receiver: DeepsetAnswerBuilder.documents
- sender: Agent.messages
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.replies
- sender: history_parser.messages
receiver: Agent.messages
- sender: ChatPromptBuilder.prompt
receiver: Agent.messages

inputs:
query:
- DeepsetAnswerBuilder.query
- history_parser.history_and_query
files:
- MultiFileConverter.sources

max_runs_per_component: 100

metadata: {}

outputs:
answers: DeepsetAnswerBuilder.answers
documents: Agent.documents

pipeline_output_type: chat


Remove OutputAdapter for Type Conversions

RAG Chat with LLM sending input to a Retriever and a Ranker

In chat pipelines, where the first LLM reformulates the query to be used for retrieval augmented generation, you can remove the OutputAdapter and connect the ChatGenerator directly to the Retriever or Ranker.

This is an example of a RAG chat pipeline with an OutputAdapter and a DocumentJoiner that you can simplify. Follow these steps:

  1. Remove OutputAdapter.
  2. Connect the replies output of the first OpenAIGenerator to the following components' inputs:
  • OpenSearchHybridRetriever's query input.
  • DeepsetNvidiaRanker's query input.
  • PromptBuilder's question input.
  • DeepsetAnswerBuilder's query input.
  1. Remove DocumentJoiner.
  2. Connect MultiFileConverter's documents output to the following components' inputs:
    • The second PromptBuilder's documents input.
    • DeepsetAnswerBuilder's documents input.
    • Output's documents input.
  3. Connect DeepsetNvidiaRanker's documents output to the following components' inputs:
    • The second PromptBuilder's documents input.
    • DeepsetAnswerBuilder's documents input.
    • Output's documents input.
Before: LLM connected to a Retriever through an OutputAdapter
components:
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: "*"
template: |-
You are part of a chatbot.
You receive a question (Current Question) and a chat history.
Use the context from the chat history and reformulate the question so that it is suitable for retrieval augmented generation.
If X is followed by Y, only ask for Y and do not repeat X again.
If the question does not require any context from the chat history, output it unedited.
Don't make questions too long, but short and precise.
Stay as close as possible to the current question.
Only output the new question, nothing else!

{{ question }}

New question:

OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low

OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0] }}"
output_type: str

OpenSearchHybridRetriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

DeepsetNvidiaRanker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: '*'
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
Ignore typing errors in the question.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
Just output the structured, informative and precise answer and nothing else.
If the documents can't answer the question, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.

These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}

Question: {{ question }}
Answer:

OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low

DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

DocumentJoiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
weights:
top_k:
sort_by_score: true

MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv

text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false

markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true

docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown

pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}

xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}

csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8

splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en

score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: List[haystack.Document]
custom_filters:
unsafe: true

text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false

tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents

connections:
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: OpenSearchHybridRetriever.query
- sender: OutputAdapter.output
receiver: DeepsetNvidiaRanker.query
- sender: OutputAdapter.output
receiver: PromptBuilder.question
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.query
- sender: retriever.documents
receiver: DeepsetNvidiaRanker.documents
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: PromptBuilder.prompt
receiver: DeepsetAnswerBuilder.prompt
- sender: OpenAIGenerator.replies
receiver: DeepsetAnswerBuilder.replies
- sender: MultiFileConverter.documents
receiver: DocumentJoiner.documents
- sender: DeepsetNvidiaRanker.documents
receiver: attachments_joiner.documents
- sender: DocumentJoiner.documents
receiver: DeepsetAnswerBuilder.documents
- sender: DocumentJoiner.documents
receiver: PromptBuilder.documents

inputs:
query:
- PromptBuilder.question
filters:
- OpenSearchHybridRetriever.filters_bm25
- OpenSearchHybridRetriever.filters_embedding
files:
- MultiFileConverter.sources

outputs:
documents: DocumentJoiner.documents
answers: DeepsetAnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

After: LLM connected directly to a Retriever and a Ranker

This is a simplified version of the pipeline above. You can remove the DocumentJoiner and OutputAdapterand connect the components directly.

components:
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: "*"
template: |-
You are part of a chatbot.
You receive a question (Current Question) and a chat history.
Use the context from the chat history and reformulate the question so that it is suitable for retrieval augmented generation.
If X is followed by Y, only ask for Y and do not repeat X again.
If the question does not require any context from the chat history, output it unedited.
Don't make questions too long, but short and precise.
Stay as close as possible to the current question.
Only output the new question, nothing else!

{{ question }}

New question:

OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low

OpenSearchHybridRetriever:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
hosts:
index: ""
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

DeepsetNvidiaRanker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
required_variables: '*'
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
Ignore typing errors in the question.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
Just output the structured, informative and precise answer and nothing else.
If the documents can't answer the question, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.

These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}

Question: {{ question }}
Answer:

OpenAIGenerator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key:
"type": "env_var"
"env_vars":
- "OPENAI_API_KEY"
"strict": False
model: "gpt-5.2"
generation_kwargs:
reasoning_effort: low
verbosity: low

DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

MultiFileConverter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv

text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false

markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
extraction_kwargs:
output_format: markdown
target_language:
include_tables: true
include_links: true

docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown

pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}

xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}

csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8

splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en

score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: List[haystack.Document]
custom_filters:
unsafe: true

text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false

tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents

connections:
- sender: PromptBuilder.prompt
receiver: OpenAIGenerator.prompt
- sender: OpenSearchHybridRetriever.documents
receiver: DeepsetNvidiaRanker.documents
- sender: PromptBuilder.prompt
receiver: DeepsetAnswerBuilder.prompt
- sender: OpenAIGenerator.replies
receiver: DeepsetAnswerBuilder.replies
- sender: DeepsetNvidiaRanker.documents
receiver: PromptBuilder.documents
- sender: DeepsetNvidiaRanker.documents
receiver: DeepsetAnswerBuilder.documents
- sender: OpenAIGenerator.replies
receiver: DeepsetNvidiaRanker.query
- sender: OpenAIGenerator.replies
receiver: DeepsetAnswerBuilder.query
- sender: MultiFileConverter.documents
receiver: DeepsetAnswerBuilder.documents
- sender: MultiFileConverter.documents
receiver: PromptBuilder.documents
- sender: MultiFileConverter.documents
receiver: Output.documents

inputs:
query:
- PromptBuilder.question
filters:
- OpenSearchHybridRetriever.filters_bm25
- OpenSearchHybridRetriever.filters_embedding
files:
- MultiFileConverter.sources

outputs:
answers: DeepsetAnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

When You Still Need OutputAdapter

You still need OutputAdapter when:

  • You're converting between types that smart connections don't support (anything other than string and ChatMessage).
  • You need explicit control over formatting, ordering, or extracting specific fields from the output.