Check the init and runtime parameters you can use to customize the CNAzureConverter node.
YAML Init Parameters
These are the parameters you can pass to this node in the pipeline YAML configuration:
Parameter | Type | Possible Values | Description |
---|---|---|---|
endpoint | String | Your Document Intelligence or Cognitive Services resource's endpoint. Mandatory. | |
credential_key | String | Your Document Intelligence or Cognitive Services resource's subscription key. Mandatory. | |
model_id | String | Default: prebuilt-read | The identifier of the model you want to use to extract information out of your file. For a list of available models, see Azure Documentation. Mandatory. |
save_json | Boolean | True False Default: False | Saves the output as a JSON file. Mandatory. |
preceding_context_len | Integer | Default: 3 | Specifies the number of lines that precede a table to extract as preceding context. It's returned as metadata. Mandatory. |
following_context_len | Integer | Default: 3 | Specifies the number of lines after a table to extract as subsequent context. It's returned as metadata. Mandatory. |
merge_multiple_column_headers | Boolean | True False Default: True | If a table contains more than one row as a column header, this parameter lets you merge these rows into a single row. Mandatory. |
id_hash_keys | List of strings | Default: None | Generates the document ID from a custom list of strings that refer to the document's attributes. To make sure there are no duplicate documents in your document store if document texts are the same, you can modify the metadata of a document and then pass ["content", "metadata"] to this field to generate IDs based on the document content and the defined metadata.Optional. |
page_layout | Literal | natural single_column Default: natural | The type reading order to follow. Possible options: - natural: Uses the natural reading order determined by Azure. - single_column: Groups all lines on the page with the same height together based on the threshold specified in threshold_y .Mandatory. |
threshold_y | Float | Default: 0.05 | The threshold to determine if two elements in a PDF should be grouped into a single line. This is especially relevant for section headers or numbers which may be spacially separated on the horizontal axis from the remaining text. The threshold is specified in inches. This is only relevant if page_layout=single_column .Optional. |
REST API Runtime Parameters
There are no runtime parameters you can pass to this node when making a request to the Search REST API endpoint.