SentenceWindowRetriever

Retrieves documents adjacent to a given document in the Document Store.

Basic Information

Type: haystack.components.retrievers.sentence_window_retriever.SentenceWindowRetriever

Inputs

Parameter	Type	Default	Description
retrieved_documents	List[Document]		List of retrieved documents from the previous retriever.
window_size	Optional[int]	None	The number of documents to retrieve before and after the relevant one. This will overwrite the `window_size` parameter set in the constructor.

Outputs

Parameter	Type	Default	Description
context_windows	List[str]		A list of strings, where each string represents the concatenated text from the context window of the corresponding document in `retrieved_documents`.
context_documents	List[Document]		A list of `Document` objects, containing the retrieved documents plus the context documents surrounding them. The documents are sorted by the `split_idx_start` meta field.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Retrieves documents adjacent to a given document in the Document Store.

During indexing, documents are broken into smaller chunks, or sentences. When you submit a query, the Retriever fetches the most relevant sentence. To provide full context, SentenceWindowRetriever fetches a number of neighboring sentences before and after each relevant one. You can set this number with the window_size parameter. It uses source_id and doc.meta['split_id'] to locate the surrounding documents.

The source_id_meta_field parameter specifies which metadata field contains the ID of the original document. This parameter can accept either a single field name (string) or a list of field names. When a list is provided, only documents matching all of the specified meta fields are retrieved.

This component works with existing Retrievers, like BM25Retriever or EmbeddingRetriever. First, use a Retriever to find documents based on a query and then use SentenceWindowRetriever to get the surrounding documents for context.

The SentenceWindowRetriever is compatible with the following DocumentStores:

Usage Example

components:
  SentenceWindowRetriever:
    type: components.retrievers.sentence_window_retriever.SentenceWindowRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
      window_size: 3
      source_id_meta_field: source_id  # Can also be a list: ["source_id", "file_id"]

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
document_store	DocumentStore		The Document Store to retrieve the surrounding documents from.
window_size	int	3	The number of documents to retrieve before and after the relevant one. For example, `window_size: 2` fetches 2 preceding and 2 following documents.
source_id_meta_field	Union[str, List[str]]	source_id	The metadata field containing the ID of the original document. Can be a single field name or a list of field names. When a list is provided, only documents matching all specified fields will be retrieved.
split_id_meta_field	str	"split_id"	The metadata field that contains the split ID of the document.
raise_on_missing_meta_fields	bool	True	If True, raises an error if the documents do not contain the required metadata fields. If False, it skips retrieving the context for documents that are missing the required metadata fields, but still includes the original document in the results.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
retrieved_documents	List[Document]		List of retrieved documents from the previous retriever.
window_size	Optional[int]	None	The number of documents to retrieve before and after the relevant one. This will overwrite the `window_size` parameter set in the constructor.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​