Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

FilterRetriever

Retrieve documents that match the provided filters. It's useful when you want to narrow down results based on document metadata without performing keyword or semantic search.

Key Features

  • Works with any document store.
  • Accepts filters at initialization and at query time.
  • Returns all documents matching the filters — no scoring or ranking applied.
  • Filters can be overridden at query time through the API or Playground.
  • Useful for metadata-based filtering, such as by date, category, or source.

Configuration

  1. Drag the FilterRetriever component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Select the document store. The document store determines where documents are retrieved from.
  4. Go to the Advanced tab to configure default filters.

Connections

FilterRetriever accepts optional filters as input. It outputs documents — a list of all documents matching those filters.

Typically, you connect FilterRetriever to a PromptBuilder or Ranker downstream. Be careful when using FilterRetriever on a large document store, because it returns all matching documents. Running it with no filters can overwhelm downstream components like generators.

Usage Example

This example shows how to use FilterRetriever to retrieve documents based on metadata filters:

components:
filter_retriever:
type: haystack.components.retrievers.filter_retriever.FilterRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
timeout:
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
Given these documents, answer the question.

Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}

Question: {{question}}
Answer:

llm:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-5-mini
generation_kwargs:
temperature: 0.7

answer_builder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters: {}

connections:
- sender: filter_retriever.documents
receiver: prompt_builder.documents
- sender: prompt_builder.prompt
receiver: llm.prompt
- sender: llm.replies
receiver: answer_builder.replies

max_runs_per_component: 100

inputs:
query:
- prompt_builder.question
- answer_builder.query

outputs:
answers: answer_builder.answers

metadata: {}

In this example, you can pass filters at query time to narrow down the documents. For instance, to retrieve only documents from a specific year, you would pass:

{
"filters": {
"field": "year",
"operator": "==",
"value": 2021
}
}

Parameters

Inputs

ParameterTypeDefaultDescription
filtersOptional[Dict[str, Any]]NoneA dictionary with filters to narrow down the search space. If not specified, the FilterRetriever uses the values provided at initialization.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A list of retrieved documents.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeDocumentStoreAn instance of a Document Store to use with the Retriever.
filtersOptional[Dict[str, Any]]NoneA dictionary with filters to narrow down the search space.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
filtersOptional[Dict[str, Any]]NoneA dictionary with filters to narrow down the search space. If not specified, the FilterRetriever uses the values provided at initialization.