CNAzureConverter Parameters

Check the init and runtime parameters you can use to customize the CNAzureConverter node.

YAML Init Parameters

These are the parameters you can pass to this node in the pipeline YAML configuration:

ParameterTypePossible ValuesDescription
endpointStringYour Form Recognizer or Cognitive Services resource's endpoint.
Mandatory.
credential_keyStringYour Form Recognizer or Cognitive Services resource's subscription key.
Mandatory.
model_idStringDefault: prebuilt-readThe identifier of the model you want to use to extract information out of your file. For a list of available models, see Azure Documentation.
Mandatory.
save_jsonBooleanTrue
False
Default: False
Saves the output as a JSON file.
Mandatory.
preceding_context_lenIntegerDefault: 3Specifies the number of lines that precede a table to extract as preceding context. It's returned as metadata.
Mandatory.
following_context_lenIntegerDefault: 3Specifies the number of lines after a table to extract as subsequent context. It's returned as metadata.
Mandatory.
merge_multiple_column_headersBooleanTrue
False
Default: True
If a table contains more than one row as a column header, this parameter lets you merge these rows into a single row.
Mandatory.
id_hash_keysList of stringsDefault: NoneGenerates the document ID from a custom list of strings that refer to the document's attributes. To make sure there are no duplicate documents in your document store if document texts are the same, you can modify the metadata of a document and then pass ["content", "metadata"] to this field to generate IDs based on the document content and the defined metadata.
Optional.
page_layoutLiteralnatural
single_column
Default: natural
The type reading order to follow. Possible options:
- natural: Uses the natural reading order determined by Azure.
- single_column: Groups all lines on the page with the same height together based on the threshold specified in threshold_y.
Mandatory.
threshold_yFloatDefault: 0.05The threshold to determine if two elements in a PDF should be grouped into a single line. This is especially relevant for section headers or numbers which may be spacially separated on the horizontal axis from the remaining text.
The threshold is specified in inches.
This is only relevant if page_layout=single_column.
Optional.

REST API Runtime Parameters

There are no runtime parameters you can pass to this node when making a request to the Search REST API endpoint.