TextConverter

deepset Cloud pipelines search through Documents stored in the DocumentStore. Documents are passages of plain text. Use TextConverter to convert your files to Document objects that pipelines can use for search.

TextConverter preprocesses files and returns Documents.

A typical scenario where you'd want to use a TextConverter is in an indexing pipeline to convert your files to plain text document objects. It's worth noting that if you add a Converter to your indexing pipeline, the conversion only happens once when you deploy the pipeline. Your files are not converted every time you run a search.
After the files are converted, they're stored in the DocumentStore.

Basic Information

  • Pipeline type: Used in indexing pipelines.
  • Position in a pipeline: Either at the very begining or after a FileTypeClassifier.
  • Nodes that can precede it in a pipeline**: Used as the first node, takes [File] as input, or after FileTypeClassifier
  • Nodes that can follow it in a pipeline: PreProcessor
  • Node input: File
  • Node output: Documents
  • Available node classes: TextConverter

Usage Examples

You can use it in your indexing pipeline as the first node.

...
components:
  - name: TextFileConverter
    type: TextConverter
...
pipelines:
  - name: indexing
    nodes:
      - name: TextFileConverter
        inputs: [File]
      - name: Preprocessor
        inputs: [TextFileConverter]