XLSXToDocument
Turn XLSX worksheets or rows into documents. This component uses pandas and openpyxl to read spreadsheets.
Key Features
- Two conversion modes: one document per worksheet (default) or one document per row.
- Configurable content column for row mode — remaining columns become document metadata.
- Sheet selection to limit conversion to specific worksheets.
- Passes additional arguments to
pandas.read_excelfor advanced control.
Configuration
- Drag the
XLSXToDocumentcomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- Configure the component settings:
- Set Document Per to
sheet(default) to create one document per worksheet, orrowto create one document per row. - If using row mode, set Content Column to specify which column becomes the document content. The default is
content. - Optionally, set Sheet Name to limit conversion to specific sheets by name or index. If not set, all sheets are converted.
- Set kwargs to pass additional arguments to
pandas.read_excel, such asengine,skiprows, ornrows.
- Set Document Per to
Connections
XLSXToDocument accepts a list of file paths or ByteStream objects through its sources input. It outputs a list of Document objects.
It typically connects with:
FileTypeRouter: receives XLSX files routed by MIME type.DocumentJoiner: sends converted documents to join with output from other converters before further processing.DocumentSplitteror embedding components: sends documents for further processing.
Usage Examples
Basic Configuration
xlsx_converter:
type: deepset_cloud_custom_nodes.converters.xlsx.XLSXToDocument
init_parameters:
document_per: row
content_column: summary
Using the Component in an Index
# haystack-pipeline
components:
file_router:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv
xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters:
table_format: csv
sheet_name:
csv_converter:
type: deepset_cloud_custom_nodes.converters.csv_rows_to_documents.DeepsetCSVRowsToDocumentsConverter
init_parameters: {}
joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
policy: NONE
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
connections:
- sender: file_router.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_router.text/csv
receiver: csv_converter.sources
- sender: xlsx_converter.documents
receiver: joiner.documents
- sender: csv_converter.documents
receiver: joiner.documents
- sender: joiner.documents
receiver: DocumentWriter.documents
inputs:
files:
- file_router.sources
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | Paths or ByteStreams that point to XLSX files. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Metadata forwarded to every document or aligned per source. |
Outputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | Documents that contain CSV content or the selected row content plus merged metadata. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
document_per | Literal["sheet", "row"] | "sheet" | Create a document per worksheet or per row. |
content_column | str | "content" | Column that holds the content when document_per is set to row. |
sheet_name | Union[str, int, List[Union[str, int]], None] | None | Limit conversion to one sheet, several sheets, or leave None to read all sheets. |
kwargs | Dict[str, Any] | Arguments forwarded to pandas.read_excel, such as engine, skiprows, or nrows. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | XLSX file paths or ByteStreams. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Metadata applied to every generated document or aligned per source entry. |
Was this page helpful?