AzureOCRDocumentConverter Parameters

Check the init and runtime parameters you can pass for this component.

YAML Init Parameters

You can specify the following parameters for AzureOCRDocumentConverter in the pipeline YAML:

ParameterTypePossible valuesDescription
endpointStringThe endpoint of your Azure resource.
Required.
api_keyStringUses the AZURE_AI_API_KEY environment variable by default. The API key to connect to your Azure resource.
Required.
model_idStringDefault: prebuilt-readThe ID of the model you want to use to convert files to documents. For a list of supported models, see Microsoft documentation.
Required.
preceding_context_lenIntegerDefault: 3The number of lines before a table to extract as its preceding context.
Required.
following_context_lenIntegerDefault: 3The number of lines after a table to extract as its subsequent context.
Required.
merge_multiple_column_headersBooleanTrue
False
Default: True
If a table contains more than one row used as a header, this parameter specifies if you want to merge multiple header rows into a single row.
Required.
page_layoutLiteralnatural
single_column
Default: natural
Specifies the type of reading order to follow. Possible values are:
- natural: Follows a natural reading order determined by Azure.
- single_column: All lines with the same heights on the page are grouped together based on a threshold set in threshold_y.
Required.
threshold_yFloatDefault: 0.05The threshold to determine if two recognized elements in a PDF should be grouped into a single line. This is especially relevant for section headers or numbers, which may be spatially separated on the horizontal axis from the remaining text. The threshold is specified in units of inches.
This setting is only relevant if single_column=page_layout.
Optional.

REST API Runtime Parameters

There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.