VoyageTextEmbedder
Embeds text strings, such as user queries, using Voyage AI models and returns the embedding for use in retrieval pipelines.
Key Features
- Supports Voyage AI embedding models optimized for retrieval and semantic similarity.
input_typeparameter for query-specific embedding optimization.- Configurable output dimensions for
voyage-3-largeandvoyage-code-3models. - Adds optional prefix and suffix strings to the text before embedding.
- Configurable timeout and retry settings.
Configuration
To use this component, connect Haystack Platform with Voyage AI first. For detailed instructions, see Use Voyage AI Models.
- Drag the
VoyageTextEmbeddercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Enter the name of the Voyage AI embedding model to use. See Voyage Embeddings documentation for available models.
- Go to the Advanced tab to configure the API key,
input_type, truncation, and other options.
Connections
VoyageTextEmbedder accepts a text string as input and outputs a floating-point embedding vector.
Use this component in query pipelines for semantic search. Connect its embedding output to an embedding retriever like OpenSearchEmbeddingRetriever.
Usage Example
This is an example RAG pipeline with VoyageTextEmbedder for query embedding:
components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
fuzziness: 0
query_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_text_embedder.VoyageTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- VOYAGE_API_KEY
strict: false
model: voyage-3
input_type: query
truncate: true
prefix:
suffix:
output_dimension:
output_dtype: float
timeout:
max_retries:
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
ranker:
type: haystack_integrations.components.rankers.voyage.ranker.VoyageRanker
init_parameters:
api_key:
type: env_var
env_vars:
- VOYAGE_API_KEY
strict: false
model: rerank-2
top_k: 8
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: "You are a helpful assistant answering the user's questions based on the provided documents.\nDo not use your own knowledge.\n\nProvided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}]:\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\nAnswer:"
generator:
type: haystack.components.generators.chat.openai.OpenAIChatGenerator
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: gpt-4o
generation_kwargs:
max_tokens: 1000
temperature: 0.7
connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: answer_builder.documents
- sender: ranker.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: generator.messages
- sender: generator.replies
receiver: answer_builder.replies
inputs:
query:
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "answer_builder.query"
- "PromptBuilder.query"
filters:
- "bm25_retriever.filters"
- "embedding_retriever.filters"
outputs:
documents: "ranker.documents"
answers: "answer_builder.answers"
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| text | str | The text to embed. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| embedding | List[float] | The embedding of the input text. | |
| meta | Dict[str, Any] | Metadata related to the embedding operation. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| api_key | Secret | Secret.from_env_var('VOYAGE_API_KEY') | The Voyage AI API key. It can be explicitly provided or automatically read from the environment variable VOYAGE_API_KEY. |
| model | str | voyage-3 | The name of the Voyage model to use. See the Voyage Embeddings documentation for available models. |
| input_type | Optional[str] | None | Type of the input text. Set to "query" for search queries or "document" for documents. When set, prepends an appropriate prompt to the text. |
| truncate | bool | True | Whether to truncate the input text to fit within the context length. If False, an error is raised when the text exceeds the context length. |
| prefix | str | "" | A string to add to the beginning of the text. |
| suffix | str | "" | A string to add to the end of the text. |
| output_dimension | Optional[int] | None | The dimension of the output embedding. Only supported by voyage-3-large and voyage-code-3 models. |
| output_dtype | str | float | The data type for the embeddings. Options: "float", "int8", "uint8", "binary", "ubinary". |
| timeout | Optional[int] | None | Timeout for Voyage AI client calls. If not set, it is inferred from the VOYAGE_TIMEOUT environment variable or set to 30. |
| max_retries | Optional[int] | None | Maximum retries if Voyage AI returns an internal error. If not set, it is inferred from the VOYAGE_MAX_RETRIES environment variable or set to five. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| text | str | The text to embed. |
Was this page helpful?