HTMLToDocument Parameters

Check the parameters you can specify to customize the HTMLToDocument component in the pipeline YAML.

YAML Init Parameters

These are the parameters you can specify for HTMLToDocument in the pipeline YAML:

ParameterTypePossible valuesDescription
extractor_typeLiteralIgnored. This parameter is just for compatibility. It will be removed soon. To customize the extraction, use the extraction_kwargs parameter.
try_othersBooleanIgnored. This parameter is just for compatibility. It will be removed soon. To customize the extraction, use the extraction_kwargs parameter.
extraction_kwargsDictionary of string and anyDefault: NoneA dictionary containing keyword arguments to customize the extraction process. These are passed to the underlying Trafilatura extract function. For the full list of available arguments, see the Trafilatura documentation.

REST API Runtime Parameters

There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.