OpenAIDocumentEmbedder

Computes document embeddings using OpenAI models.

Basic Information

Type: haystack.components.embedders.openai_document_embedder.OpenAIDocumentEmbedder

Inputs

Parameter	Type	Default	Description
documents	List[Document]		A list of documents to embed.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A list of documents with embeddings.
meta	Dict[str, Any]		Information about the usage of the model, including model name and token usage.

Overview

Computes document embeddings using OpenAI models.

Usage Example

components:
  OpenAIDocumentEmbedder:
    type: components.embedders.openai_document_embedder.OpenAIDocumentEmbedder
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
api_key	Secret	Secret.from_env_var('OPENAI_API_KEY')	The OpenAI API key. You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter during initialization.
model	str	text-embedding-ada-002	The name of the model to use for calculating embeddings. The default model is `text-embedding-ada-002`.
dimensions	Optional[int]	None	The number of dimensions of the resulting embeddings. Only `text-embedding-3` and later models support this parameter.
api_base_url	Optional[str]	None	Overrides the default base URL for all HTTP requests.
organization	Optional[str]	None	Your OpenAI organization ID. See OpenAI's Setting Up Your Organization for more information.
prefix	str	""	A string to add at the beginning of each text.
suffix	str	""	A string to add at the end of each text.
batch_size	int	32	Number of documents to embed at once.
progress_bar	bool	True	If `True`, shows a progress bar when running.
meta_fields_to_embed	Optional[List[str]]	None	List of metadata fields to embed along with the document text.
embedding_separator	str	\n	Separator used to concatenate the metadata fields to the document text.
timeout	Optional[float]	None	Timeout for OpenAI client calls. If not set, it defaults to either the `OPENAI_TIMEOUT` environment variable, or 30 seconds.
max_retries	Optional[int]	None	Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or 5 retries.
http_client_kwargs	Optional[Dict[str, Any]]	None	A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. For more information, see the HTTPX documentation.
raise_on_failure	bool	False	Whether to raise an exception if the embedding request fails. If `False`, the component will log the error and continue processing the remaining documents. If `True`, it will raise an exception on failure.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		A list of documents to embed.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​