VoyageDocumentEmbedder

Compute document embeddings using models by Voyage. The resulting embeddings are stored in the embedding metadata field of each document.

Basic Information

ℹ️

Third Party Integration

Voyage AI is a third party integration developed by an external provider and is not maintained by deepset. While we encourage you to explore it, we recommend reviewing it carefully to ensure it meets your needs.


  • Pipeline type: Indexing
  • Type: haystack_integrations.components.embedders.voyage_embedders.VoyageDocumentEmbedder
  • Components it typically connects with:
    • DocumentWriter: VoyageDocumentEmbedder can send the converted documents to DocumentWriter which writes them into a document store.
    • PreProcessors: VoyageDocumentEmbedder can receive documents from a PreProcessor, such as DocumentSplitter.

Inputs

NameTypeDescription
documentsList of Document objectsThe documents to embed.

Outputs

NameTypeDescription
documentsList of Document objectsDocuments with the calculated embeddings stored in their embedding metadata field.
metaDictionaryInformation about the usage of the model.

Overview

VoyageDocumentEmbedder calculates vector representations for documents. It's used in indexing pipelines after PreProcessors and before DocumentWriter. VoyageDocumentEmbedder is an external integration, which means it was added by a Haystack community member and is maintained by them. For most recent information about this component, check the GitHub repository.

If you use VoyageDocumentEmbedder in your indexing pipeline, you must use VoyageTextEmbedder to embed the query in your query pipeline, with the same model.

ℹ️

Embedding Models in Query and Indexing Pipelines

The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.

This means the embedders for your indexing and query pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.

Authorization

For details, check Use Voyage AI Models.

Usage Example

In this example, VoyageDocumentEmbedder receives documents from DocumentSplitter and then sends the output documents to DocumentWriter.

VoyageDocumentEmbedder connected to DocumentSplitter and DocumentWriter

And here is the YAML configuration:

components:
  ...
    splitter:
      type: haystack.components.preprocessors.document_splitter.DocumentSplitter
      init_parameters:
        split_by: word
        split_length: 250
        split_overlap: 30

    document_embedder:
      type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
      init_parameters:
        model: "voyage-2" # the model to use

    writer:
      type: haystack.components.writers.document_writer.DocumentWriter
      init_parameters:
        document_store:
          type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
          init_parameters:
            embedding_dim: 768
            similarity: cosine
        policy: OVERWRITE
        
connections:  # Defines how the components are connected
  ...
  - sender: splitter.documents
    receiver: document_embedder.documents
  - sender: document_embedder.documents
    receiver: writer.documents

Init Parameters

For the initialization parameters, check the init() method of the component in GitHub.