Convert Group
These components help you turn external inputs into content your pipeline can process. Use them to ingest sources such as files or web pages and convert them into a consistent representation that you can clean, split, embed, and store.
This subgroup is a good fit for the start of indexing pipelines, where the main goal is to bring data into your system in a reliable way. It also helps you standardize inputs early, so the rest of your pipeline can focus on processing and retrieval rather than handling format differences.
Available Components
- CSVToDocument
- DOCXToDocument
- DeepsetCSVRowsToDocumentsConverter
- DocumentToImageContent
- FileToFileContent
- HTMLToDocument
- ImageFileToDocument
- ImageFileToImageContent
- JSONConverter
- JsonParser
- LinkContentFetcher
- MSGToDocument
- MarkdownToDocument
- MultiFileConverter
- PDFMinerToDocument
- PDFToImageContent
- PPTXToDocument
- PyPDFToDocument
- RemoteWhisperTranscriber
- TextFileToDocument
- XLSXToDocument
Was this page helpful?