Skip to main content

VoyageTextEmbedder

Embed text, such as user query, using Voyage AI embedding models.

Basic Information

  • Type: haystack_integrations.components.embedders.voyage_embedders.voyage_text_embedder.VoyageTextEmbedder
  • Components it can connect with:
    • OpenSearchEmbeddingRetriever: VoyageTextEmbedder sends the embedding to a retriever.
    • Input: VoyageTextEmbedder receives the query to embed from the Input component.

Inputs

ParameterTypeDefaultDescription
textstrThe text to embed.

Outputs

ParameterTypeDefaultDescription
embeddingList[float]The embedding of the input text.
metaDict[str, Any]Metadata related to the embedding operation.

Overview

Use VoyageTextEmbedder to create embeddings for text, such as user query, using Voyage AI models. This component is typically used to embed queries in retrieval pipelines, where the query embedding is compared against document embeddings to find relevant content.

Voyage AI provides high-quality embedding models optimized for various use cases including retrieval, semantic similarity, and classification.

Authorization

You need a Voyage AI API key to use this component. Connect deepset to your Voyage AI account on the Integrations page. For detailed instructions, see Use Voyage AI Models.

Usage Example

This is an example RAG pipeline with VoyageTextEmbedder for query embedding:

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
fuzziness: 0

query_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_text_embedder.VoyageTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- VOYAGE_API_KEY
strict: false
model: voyage-3
input_type: query
truncate: true
prefix:
suffix:
output_dimension:
output_dtype: float
timeout:
max_retries:

embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20

document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

ranker:
type: haystack_integrations.components.rankers.voyage.ranker.VoyageRanker
init_parameters:
api_key:
type: env_var
env_vars:
- VOYAGE_API_KEY
strict: false
model: rerank-2
top_k: 8

answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: "You are a helpful assistant answering the user's questions based on the provided documents.\nDo not use your own knowledge.\n\nProvided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}]:\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\nAnswer:"

generator:
type: haystack.components.generators.chat.openai.OpenAIChatGenerator
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: gpt-4o
generation_kwargs:
max_tokens: 1000
temperature: 0.7

connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: answer_builder.documents
- sender: ranker.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: generator.messages
- sender: generator.replies
receiver: answer_builder.replies

inputs:
query:
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "answer_builder.query"
- "PromptBuilder.query"
filters:
- "bm25_retriever.filters"
- "embedding_retriever.filters"

outputs:
documents: "ranker.documents"
answers: "answer_builder.answers"

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
api_keySecretSecret.from_env_var('VOYAGE_API_KEY')The Voyage AI API key. It can be explicitly provided or automatically read from the environment variable VOYAGE_API_KEY.
modelstrvoyage-3The name of the Voyage model to use. See the Voyage Embeddings documentation for available models.
input_typeOptional[str]NoneType of the input text. Set to "query" for search queries or "document" for documents. When set, prepends an appropriate prompt to the text.
truncateboolTrueWhether to truncate the input text to fit within the context length. If False, an error is raised when the text exceeds the context length.
prefixstr""A string to add to the beginning of the text.
suffixstr""A string to add to the end of the text.
output_dimensionOptional[int]NoneThe dimension of the output embedding. Only supported by voyage-3-large and voyage-code-3 models.
output_dtypestrfloatThe data type for the embeddings. Options: "float", "int8", "uint8", "binary", "ubinary".
timeoutOptional[int]NoneTimeout for Voyage AI client calls. If not set, it is inferred from the VOYAGE_TIMEOUT environment variable or set to 30.
max_retriesOptional[int]NoneMaximum retries if Voyage AI returns an internal error. If not set, it is inferred from the VOYAGE_MAX_RETRIES environment variable or set to five.

Run Method Parameters

These are the parameters you can configure for the component's run() method. You can pass these parameters at query time through the API, in Playground, or when running a job.

ParameterTypeDefaultDescription
textstrThe text to embed.