HierarchicalDocumentSplitter

Splits a documents into different block sizes building a hierarchical tree structure of blocks of different sizes.

Basic Information

Type: haystack_integrations.preprocessors.hierarchical_document_splitter.HierarchicalDocumentSplitter

Inputs

Parameter	Type	Default	Description
documents	List[Document]		List of Documents to split into hierarchical blocks.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		List of HierarchicalDocument

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Splits a documents into different block sizes building a hierarchical tree structure of blocks of different sizes.

The root node of the tree is the original document, the leaf nodes are the smallest blocks. The blocks in between are connected such that the smaller blocks are children of the parent-larger blocks.

Usage Example

components:
  HierarchicalDocumentSplitter:
    type: components.preprocessors.hierarchical_document_splitter.HierarchicalDocumentSplitter
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
block_sizes	Set[int]		Set of block sizes to split the document into. The blocks are split in descending order.
split_overlap	int	0	The number of overlapping units for each split.
split_by	Literal['word', 'sentence', 'page', 'passage']	word	The unit for splitting your documents.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		List of Documents to split into hierarchical blocks.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​