SentenceWindowRetriever
Retrieve the context window of documents surrounding a relevant result. When a retriever finds a relevant sentence, this component fetches the neighboring sentences before and after it to provide full context for downstream components.
Key Features
- Retrieves neighboring documents around each relevant document.
- Window size is configurable and can be overridden at query time.
- Uses
source_idandsplit_idmetadata fields to locate surrounding documents. - Accepts a single field name or a list of field names as the source ID.
- Compatible with multiple document stores: Astra, Elasticsearch, OpenSearch, Pgvector, Pinecone, and Qdrant.
Configuration
- Drag the
SentenceWindowRetrievercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Select the document store. The document store determines where documents are retrieved from.
- Go to the Advanced tab to configure
window_size,source_id_meta_field, andsplit_id_meta_field.
Connections
SentenceWindowRetriever accepts retrieved_documents (a list of documents from a previous retriever) and an optional window_size override as inputs. It outputs context_windows (a list of concatenated context strings) and context_documents (a list of documents including surrounding context).
Typically, you place SentenceWindowRetriever after a retriever (such as OpenSearchBM25Retriever or OpenSearchEmbeddingRetriever) and before a PromptBuilder. The surrounding context helps the LLM generate better answers.
Usage Example
components:
SentenceWindowRetriever:
type: components.retrievers.sentence_window_retriever.SentenceWindowRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
window_size: 3
source_id_meta_field: source_id # Can also be a list: ["source_id", "file_id"]
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| retrieved_documents | List[Document] | List of retrieved documents from the previous retriever. | |
| window_size | Optional[int] | None | The number of documents to retrieve before and after the relevant one. This will overwrite the window_size parameter set in the constructor. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| context_windows | List[str] | A list of strings, where each string represents the concatenated text from the context window of the corresponding document in retrieved_documents. | |
| context_documents | List[Document] | A list of Document objects, containing the retrieved documents plus the context documents surrounding them. The documents are sorted by the split_idx_start meta field. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | DocumentStore | The Document Store to retrieve the surrounding documents from. | |
| window_size | int | 3 | The number of documents to retrieve before and after the relevant one. For example, window_size: 2 fetches 2 preceding and 2 following documents. |
| source_id_meta_field | Union[str, List[str]] | source_id | The metadata field containing the ID of the original document. Can be a single field name or a list of field names. When a list is provided, only documents matching all specified fields will be retrieved. |
| split_id_meta_field | str | "split_id" | The metadata field that contains the split ID of the document. |
| raise_on_missing_meta_fields | bool | True | If True, raises an error if the documents do not contain the required metadata fields. If False, it skips retrieving the context for documents that are missing the required metadata fields, but still includes the original document in the results. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| retrieved_documents | List[Document] | List of retrieved documents from the previous retriever. | |
| window_size | Optional[int] | None | The number of documents to retrieve before and after the relevant one. This will overwrite the window_size parameter set in the constructor. |
Was this page helpful?