Check the init and runtime parameters you can pass for this component.
YAML Init Parameters
You can specify the following parameters for AzureOCRDocumentConverter in the pipeline YAML:
Parameter | Type | Possible values | Description |
---|---|---|---|
endpoint | String | The endpoint of your Azure resource. Required. | |
api_key | String | Uses the AZURE_AI_API_KEY environment variable by default. | The API key to connect to your Azure resource. Required. |
model_id | String | Default: prebuilt-read | The ID of the model you want to use to convert files to documents. For a list of supported models, see Microsoft documentation. Required. |
preceding_context_len | Integer | Default: 3 | The number of lines before a table to extract as its preceding context. Required. |
following_context_len | Integer | Default: 3 | The number of lines after a table to extract as its subsequent context. Required. |
merge_multiple_column_headers | Boolean | True False Default: True | If a table contains more than one row used as a header, this parameter specifies if you want to merge multiple header rows into a single row. Required. |
page_layout | Literal | natural single_column Default: natural | Specifies the type of reading order to follow. Possible values are: - natural : Follows a natural reading order determined by Azure.- single_column : All lines with the same heights on the page are grouped together based on a threshold set in threshold_y .Required. |
threshold_y | Float | Default: 0.05 | The threshold to determine if two recognized elements in a PDF should be grouped into a single line. This is especially relevant for section headers or numbers, which may be spatially separated on the horizontal axis from the remaining text. The threshold is specified in units of inches. This setting is only relevant if single_column=page_layout .Optional. |
REST API Runtime Parameters
There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.