YAML Init Parameters

You can specify the following parameters for AzureOCRDocumentConverter in the pipeline YAML:

Parameter	Type	Possible values	Description
`endpoint`	String		The endpoint of your Azure resource. Required.
`api_key`	String	Uses the `AZURE_AI_API_KEY` environment variable by default.	The API key to connect to your Azure resource. Required.
`model_id`	String	Default: `prebuilt-read`	The ID of the model you want to use to convert files to documents. For a list of supported models, see Microsoft documentation. Required.
`preceding_context_len`	Integer	Default: `3`	The number of lines before a table to extract as its preceding context. Required.
`following_context_len`	Integer	Default: `3`	The number of lines after a table to extract as its subsequent context. Required.
`merge_multiple_column_headers`	Boolean	`True` `False` Default: `True`	If a table contains more than one row used as a header, this parameter specifies if you want to merge multiple header rows into a single row. Required.
`page_layout`	Literal	`natural` `single_column` Default: `natural`	Specifies the type of reading order to follow. Possible values are: `natural`: Follows a natural reading order determined by Azure. `single_column`: All lines with the same heights on the page are grouped together based on a threshold set in `threshold_y`. Required.
`threshold_y`	Float	Default: `0.05`	The threshold to determine if two recognized elements in a PDF should be grouped into a single line. This is especially relevant for section headers or numbers, which may be spatially separated on the horizontal axis from the remaining text. The threshold is specified in units of inches. This setting is only relevant if `single_column=page_layout`. Optional.

REST API Runtime Parameters

There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.