TextConverter

deepset Cloud pipelines search through Documents stored in the Document Store. Documents are passages of plain text. Before you run a search on your files, use TextConverter to convert them to Document objects that pipelines can use for search.

TextConverter preprocesses files and returns documents. It takes File as input and produces document as output.

A typical scenario where you'd want to use a TextConverter is in an indexing pipeline to convert your files to plain text document objects. It's worth noting that if you add a Converter to your indexing pipeline, the conversion only happens once when you deploy the pipeline. Your files are not converted every time you run a search.
After the files are converted, they're stored in the document store.

Usage

You can use it in your indexing pipeline as the first node. First, define it in the components section of the pipeline definition file:

components:
  - name: TextFileConverter
    type: TextConverter
    params: {}

And then add it to the pipelines section, for example:

pipelines:
  - name: indexing
    nodes:
      - name: TextFileConverter
        inputs: [File]
      - name: Preprocessor
        inputs: [TextFileConverter]