SentenceTransformersTextEmbedder

Embeds strings using Sentence Transformers models.

Basic Information

Type: haystack_integrations.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder

Inputs

Parameter	Type	Default	Description
text	str		Text to embed.

Outputs

Parameter	Type	Default	Description
embedding	List[float]		A dictionary with the following keys: - `embedding`: The embedding of the input text.

Overview

Embeds strings using Sentence Transformers models.

You can use it to embed user query and send it to an embedding retriever.

Usage example:

from haystack.components.embedders import SentenceTransformersTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

Usage Example

components:
  SentenceTransformersTextEmbedder:
    type: components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model	str	sentence-transformers/all-mpnet-base-v2	The model to use for calculating embeddings. Specify the path to a local model or the ID of the model on Hugging Face.
device	Optional[ComponentDevice]	None	Overrides the default device used to load the model.
token	Optional[Secret]	Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)	An API token to use private models from Hugging Face.
prefix	str		A string to add at the beginning of each text to be embedded. You can use it to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.
suffix	str		A string to add at the end of each text to embed.
batch_size	int	32	Number of texts to embed at once.
progress_bar	bool	True	If `True`, shows a progress bar for calculating embeddings. If `False`, disables the progress bar.
normalize_embeddings	bool	False	If `True`, the embeddings are normalized using L2 normalization, so that the embeddings have a norm of 1.
trust_remote_code	bool	False	If `False`, permits only Hugging Face verified model architectures. If `True`, permits custom models and scripts.
local_files_only	bool	False	If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
truncate_dim	Optional[int]	None	The dimension to truncate sentence embeddings to. `None` does no truncation. If the model has not been trained with Matryoshka Representation Learning, truncation of embeddings can significantly affect performance.
model_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained` when loading the model. Refer to specific model documentation for available kwargs.
tokenizer_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer. Refer to specific model documentation for available kwargs.
config_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
precision	Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']	float32	The precision to use for the embeddings. All non-float32 precisions are quantized embeddings. Quantized embeddings are smaller in size and faster to compute, but may have a lower accuracy. They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
encode_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for `SentenceTransformer.encode` when embedding texts. This parameter is provided for fine customization. Be careful not to clash with already set parameters and avoid passing parameters that change the output type.
backend	Literal['torch', 'onnx', 'openvino']	torch	The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino". Refer to the Sentence Transformers documentation for more information on acceleration and quantization options.
revision	Optional[str]	None	The specific model version to use. It can be a branch name, a tag name, or a commit ID for a stored model on Hugging Face. This enables pinning to a particular model version for reproducibility and stability.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
text	str		Text to embed.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​