Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

XLSXToDocument

Turn XLSX worksheets or rows into documents. This component uses pandas and openpyxl to read spreadsheets.

Deprecation Notice

This component is deprecated. It will continue to work in your existing pipelines for now. You can replace it with the XLSXToDocument component.

Key Features

  • Creates one document per worksheet or one document per row, controlled by the document_per parameter.
  • In row mode, uses the column specified by content_column as the document content and moves other columns into metadata.
  • Preserves metadata from ByteStream inputs and records the sheet name and row index for traceability.
  • Limits conversion to specific sheets using the sheet_name parameter.
  • Forwards additional arguments to pandas.read_excel for fine-grained control.
  • Integrates with FileTypeRouter and DocumentJoiner in multi-format indexing pipelines.

Configuration

  1. Drag the XLSXToDocument component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. Configure the parameters as needed.

Connections

XLSXToDocument accepts a list of file paths or ByteStream objects (sources) as input, along with optional metadata (meta). It outputs a list of converted documents (documents).

Typically, XLSXToDocument receives XLSX files routed from FileTypeRouter and sends its output to DocumentJoiner, which combines documents from multiple converters before passing them downstream for indexing.

Usage Example

Using the Component in an Index

components:
file_router:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
xlsx_converter:
type: deepset_cloud_custom_nodes.converters.xlsx.XLSXToDocument
init_parameters:
document_per: row
content_column: summary
csv_converter:
type: deepset_cloud_custom_nodes.converters.csv_rows_to_documents.DeepsetCSVRowsToDocumentsConverter
init_parameters: {}
joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
policy: NONE
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:

connections:
- sender: file_router.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_router.text/csv
receiver: csv_converter.sources
- sender: xlsx_converter.documents
receiver: joiner.documents
- sender: csv_converter.documents
receiver: joiner.documents

- sender: joiner.documents
receiver: DocumentWriter.documents

inputs:
files:
- file_router.sources

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
sourcesList[Union[str, Path, ByteStream]]Paths or ByteStreams that point to XLSX files.
metaOptional[Union[Dict[str, Any], List[Dict[str, Any]]]]NoneMetadata forwarded to every document or aligned per source.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]Documents that contain CSV content or the selected row content plus merged metadata.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_perLiteral["sheet", "row"]"sheet"Create a document per worksheet or per row.
content_columnstr"content"Column that holds the content when document_per is set to row.
sheet_nameUnion[str, int, List[Union[str, int]], None]NoneLimit conversion to one sheet, several sheets, or leave None to read all sheets.
kwargsDict[str, Any]Arguments forwarded to pandas.read_excel, such as engine, skiprows, or nrows.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
sourcesList[Union[str, Path, ByteStream]]XLSX file paths or ByteStreams.
metaOptional[Union[Dict[str, Any], List[Dict[str, Any]]]]NoneMetadata applied to every generated document or aligned per source entry.