Check the parameters you can specify to customize the HTMLToDocument component in the pipeline YAML.
YAML Init Parameters
These are the parameters you can specify for HTMLToDocument in the pipeline YAML:
Parameter | Type | Possible values | Description |
---|---|---|---|
extractor_type | Literal | Ignored. This parameter is just for compatibility. It will be removed soon. To customize the extraction, use the extraction_kwargs parameter. | |
try_others | Boolean | Ignored. This parameter is just for compatibility. It will be removed soon. To customize the extraction, use the extraction_kwargs parameter. | |
extraction_kwargs | Dictionary of string and any | Default: None | A dictionary containing keyword arguments to customize the extraction process. These are passed to the underlying Trafilatura extract function. For the full list of available arguments, see the Trafilatura documentation. |
REST API Runtime Parameters
There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.