TextConverter
deepset Cloud pipelines search through Documents stored in the Document Store. Documents are passages of plain text. Before you run a search on your files, use TextConverter
to convert them to Document objects that pipelines can use for search.
TextConverter
preprocesses files and returns documents. It takes File
as input and produces document
as output.
A typical scenario where you'd want to use a TextConverter is in an indexing pipeline to convert your files to plain text document objects. It's worth noting that if you add a Converter to your indexing pipeline, the conversion only happens once when you deploy the pipeline. Your files are not converted every time you run a search.
After the files are converted, they're stored in the document store.
Usage
You can use it in your indexing pipeline as the first node. First, define it in the components section of the pipeline definition file:
components:
- name: TextFileConverter
type: TextConverter
params: {}
And then add it to the pipelines section, for example:
pipelines:
- name: indexing
nodes:
- name: TextFileConverter
inputs: [File]
- name: Preprocessor
inputs: [TextFileConverter]
Updated about 1 month ago