Use Azure Document Intelligence
Convert files to documents using the Azure's Document Intelligence service.
About this Task
Azure Document Intelligence extracts text from files in the following formats:
- JPEG
- PNG
- BMP
- TIFF
- DOCX
- XLSX
- PPTX
- HTML
For more details on the service capabilities, see the Azure Document Intelligence website. For a list of models you can use to process your files, see model overview in Document Intelligence documentation.
Prerequisites
You need an API key from your Azure account with the Document Intelligence resource. For details, see Get started wtih Document Intelligence in Azure documentation.
Use Azure Document Intelligence
First, connect deepset Cloud to Azure Document Intelligence through the Connections page:
- Click your name in the top right corner and select Connections.
- Click Connect next to a model provider.
- Enter your user access token and submit it.
Then, add the CNAzureConverter node to your indexing pipeline.
Usage Example
...
components
name AzureConverter
type CNAzureConverter
params
endpoint <Form Recognizer or Cognitive Services endpoint>
credential_key"" # Leave this field as an empty string
model_id prebuilt-read
...
pipelines
# here comes the query pipeline which we skipped in this example
name indexing
nodes
name FileTypeClassifier
inputs File
name AzureConverter
inputs FileTypeClassifier.output_2 # output_2 is where PDF files are routed
name Preprocessor
inputs AzureConverter
...
Updated 5 months ago
Related Links
Did this page help you?