Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

VoyageDocumentEmbedder

Compute document embeddings using Voyage AI embedding models. Use this component in indexes to embed documents before storing them in a vector database.

Key Features

  • Computes dense vector embeddings for documents using Voyage AI models.
  • Supports configurable input types (document or query) for optimized embeddings.
  • Embeds documents in configurable batch sizes for efficient processing.
  • Supports configurable output dimension and data type.
  • Optionally embeds metadata fields along with document content.

Configuration

  1. Drag the VoyageDocumentEmbedder component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Connect Haystack Platform to your Voyage AI account on the Integrations page. For detailed instructions, see Use Voyage AI Models.
    • Select the embedding model to use.
    • Set input_type to "document" for indexing documents.
  4. Go to the Advanced tab to configure timeout, max_retries, output_dimension, output_dtype, prefix, suffix, and metadata embedding options.

Connections

VoyageDocumentEmbedder receives documents to embed from PreProcessors like DocumentSplitter. It sends documents with embeddings to DocumentWriter to write them into a document store.

Usage Examples

Basic Configuration

  document_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- VOYAGE_API_KEY
strict: false
model: voyage-3
input_type: document
truncate: true
output_dtype: float
batch_size: 32
embedding_separator: "\n"
progress_bar: true

This is an example index with VoyageDocumentEmbedder for document embedding:

components:
converter:
type: haystack.components.converters.multi_file_converter.MultiFileConverter
init_parameters:
encoding: utf-8

cleaner:
type: haystack.components.preprocessors.document_cleaner.DocumentCleaner
init_parameters:
remove_empty_lines: true
remove_extra_whitespaces: true
remove_repeated_substrings: false
keep_id: false

splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: sentence
split_length: 5
split_overlap: 1
split_threshold: 0

document_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- VOYAGE_API_KEY
strict: false
model: voyage-3
input_type: document
truncate: true
prefix:
suffix:
output_dimension:
output_dtype: float
batch_size: 32
metadata_fields_to_embed:
embedding_separator: "\n"
progress_bar: true
timeout:
max_retries:

writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

connections:
- sender: converter.documents
receiver: cleaner.documents
- sender: cleaner.documents
receiver: splitter.documents
- sender: splitter.documents
receiver: document_embedder.documents
- sender: document_embedder.documents
receiver: writer.documents

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDescription
documentsList[Document]A list of documents to embed.

Outputs

ParameterTypeDescription
documentsList[Document]Documents with embeddings stored in the embedding field.
metaDict[str, Any]Metadata about the embedding operation.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
api_keySecretSecret.from_env_var('VOYAGE_API_KEY')The Voyage AI API key. It can be explicitly provided or automatically read from the environment variable VOYAGE_API_KEY.
modelstrvoyage-3The name of the Voyage model to use. See the Voyage Embeddings documentation for available models.
input_typeOptional[str]NoneType of the input text. Set to "document" for indexing documents or "query" for search queries. When set, prepends an appropriate prompt to the text.
truncateboolTrueWhether to truncate the input text to fit within the context length. If False, an error is raised when the text exceeds the context length.
prefixstr""A string to add to the beginning of each text.
suffixstr""A string to add to the end of each text.
output_dimensionOptional[int]NoneThe dimension of the output embedding. Only supported by voyage-3-large and voyage-code-3 models.
output_dtypestrfloatThe data type for the embeddings. Options: "float", "int8", "uint8", "binary", "ubinary".
batch_sizeint32Number of documents to encode at once.
metadata_fields_to_embedOptional[List[str]]NoneList of metadata fields to embed along with the document content.
embedding_separatorstr"\n"Separator used to concatenate metadata fields to the document content.
progress_barboolTrueWhether to show a progress bar during processing.
timeoutOptional[int]NoneTimeout for Voyage AI client calls. If not set, it is inferred from the VOYAGE_TIMEOUT environment variable or set to 30.
max_retriesOptional[int]NoneMaximum retries if Voyage AI returns an internal error. If not set, it is inferred from the VOYAGE_MAX_RETRIES environment variable or set to five.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to embed.