Skip to main content

InstructorTextEmbedder

A component for embedding strings using INSTRUCTOR embedding models.

Basic Information

  • Type: haystack_integrations.instructor_embedders.src.haystack_integrations.components.embedders.instructor_embedders.instructor_text_embedder.InstructorTextEmbedder

Inputs

ParameterTypeDefaultDescription
textstr

Outputs

ParameterTypeDefaultDescription
embeddingList[float]

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

A component for embedding strings using INSTRUCTOR embedding models.

Usage example:

# To use this component, install the "instructor-embedders-haystack" package.
# pip install instructor-embedders-haystack

from haystack.utils.device import ComponentDevice
from haystack_integrations.components.embedders.instructor_embedders import InstructorTextEmbedder

text = ("It clearly says online this will work on a Mac OS system. The disk comes and it does not, only Windows.
"Do Not order this if you have a Mac!!")
instruction = (
"Represent the Amazon comment for classifying the sentence as positive or negative"
)

text_embedder = InstructorTextEmbedder(
model="hkunlp/instructor-base", instruction=instruction,
device=ComponentDevice.from_str("cpu")
)
text_embedder.warm_up()

embedding = text_embedder.run(text)

Usage Example

components:
InstructorTextEmbedder:
type: instructor_embedders.src.haystack_integrations.components.embedders.instructor_embedders.instructor_text_embedder.InstructorTextEmbedder
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelstrhkunlp/instructor-baseLocal path or name of the model in Hugging Face's model hub, such as 'hkunlp/instructor-base'.
deviceOptional[ComponentDevice]NoneThe device on which the model is loaded. If None, the default device is automatically selected.
tokenOptional[Secret]Secret.from_env_var('HF_API_TOKEN', strict=False)The API token used to download private models from Hugging Face.
instructionstrRepresent the sentenceThe instruction string to be used while computing domain-specific embeddings. The instruction follows the unified template of the form: "Represent the 'domain' 'text_type' for 'task_objective'", where: - "domain" is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc. - "text_type" is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc. - "task_objective" is optional, and it specifies the objective of embedding, e.g., retrieve a document, classify the sentence, etc. Check some examples of instructions here.
batch_sizeint32Number of strings to encode at once.
progress_barboolTrueIf true, displays progress bar during embedding.
normalize_embeddingsboolFalseIf set to true, returned vectors will have the length of 1.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
textstr