SentenceWindowRetriever
Retrieve documents adjacent to a given document in the document store to provide fuller context. When an initial retriever finds a relevant sentence, SentenceWindowRetriever fetches the surrounding sentences before and after it, giving the LLM more context to work with.
Key Features
- Expands retrieval context by fetching neighboring documents around each retrieved document.
- Configurable window size to control how many surrounding documents to include.
- Compatible with BM25 and embedding-based retrievers.
- Supports multiple document store backends (OpenSearch, Elasticsearch, Pgvector, Pinecone, Qdrant, Astra).
- Flexible source document identification using single or multiple metadata fields.
Configuration
- Drag the
SentenceWindowRetrievercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Configure the
document_storeto retrieve neighboring documents from. - Set
window_sizeto control how many documents to fetch before and after each retrieved document (default: 3).
- Configure the
- Go to the Advanced tab to configure
source_id_meta_field,split_id_meta_field, andraise_on_missing_meta_fields.
Connections
SentenceWindowRetriever receives a list of retrieved documents through its retrieved_documents input, typically from a BM25 or embedding retriever. It outputs two values: context_windows (a list of concatenated strings) and context_documents (a list of document objects including context). Connect context_documents to a prompt builder or LLM for downstream processing.
Source Code
To check this component's source code, open sentence_window_retriever.py in the Haystack repository.
Usage Examples
Basic Configuration
SentenceWindowRetriever:
type: components.retrievers.sentence_window_retriever.SentenceWindowRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: Standard-Index-English
window_size: 3
source_id_meta_field: source_id
components:
SentenceWindowRetriever:
type: components.retrievers.sentence_window_retriever.SentenceWindowRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
window_size: 3
source_id_meta_field: source_id # Can also be a list: ["source_id", "file_id"]
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
retrieved_documents | List[Document] | List of retrieved documents from the previous retriever. | |
window_size | Optional[int] | None | The number of documents to retrieve before and after the relevant one. Overrides the window_size parameter set at initialization. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
context_windows | List[str] | A list of strings, where each string represents the concatenated text from the context window of the corresponding document in retrieved_documents. | |
context_documents | List[Document] | A list of Document objects, containing the retrieved documents plus the context documents surrounding them. The documents are sorted by the split_idx_start metadata field. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
document_store | DocumentStore | The document store to retrieve the surrounding documents from. | |
window_size | int | 3 | The number of documents to retrieve before and after the relevant one. For example, window_size: 2 fetches two preceding and two following documents. |
source_id_meta_field | Union[str, List[str]] | source_id | The metadata field containing the ID of the original document. Can be a single field name or a list of field names. When a list is provided, only documents matching all specified fields are retrieved. |
split_id_meta_field | str | "split_id" | The metadata field that contains the split ID of the document. |
raise_on_missing_meta_fields | bool | True | If True, raises an error if the documents do not contain the required metadata fields. If False, skips retrieving context for documents with missing metadata fields, but still includes the original document in the results. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
retrieved_documents | List[Document] | List of retrieved documents from the previous retriever. | |
window_size | Optional[int] | None | The number of documents to retrieve before and after the relevant one. Overrides the window_size parameter set at initialization. |
Was this page helpful?