VoyageDocumentEmbedder

Compute document embeddings using models by Voyage. The resulting embeddings are stored in the embedding metadata field of each document.

Suggest Edits

Basic Information

ℹ️
Third Party Integration
Voyage AI is a third party integration developed by an external provider and is not maintained by deepset. While we encourage you to explore it, we recommend reviewing it carefully to ensure it meets your needs.

Pipeline type: Indexing
Type: haystack_integrations.components.embedders.voyage_embedders.VoyageDocumentEmbedder
Components it typically connects with:
- DocumentWriter: VoyageDocumentEmbedder can send the converted documents to DocumentWriter which writes them into a document store.
- PreProcessors: VoyageDocumentEmbedder can receive documents from a PreProcessor, such as DocumentSplitter.

Inputs

Name	Type	Description
`documents`	List of `Document` objects	The documents to embed.

Outputs

Name	Type	Description
`documents`	List of `Document` objects	Documents with the calculated embeddings stored in their `embedding` metadata field.
`meta`	Dictionary	Information about the usage of the model.

Overview

VoyageDocumentEmbedder calculates vector representations for documents. It's used in indexing pipelines after PreProcessors and before DocumentWriter. VoyageDocumentEmbedder is an external integration, which means it was added by a Haystack community member and is maintained by them. For most recent information about this component, check the GitHub repository.

If you use VoyageDocumentEmbedder in your indexing pipeline, you must use VoyageTextEmbedder to embed the query in your query pipeline, with the same model.

ℹ️
Embedding Models in Query Pipelines and Indexes
The embedding model you use to embed documents in your index must be the same as the embedding model you use to embed the query in your pipeline.
This means the embedders for your indexes and pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.

Authorization

For details, check Use Voyage AI Models.

Usage Example

In this example, VoyageDocumentEmbedder receives documents from DocumentSplitter and then sends the output documents to DocumentWriter.

And here is the YAML configuration:

components:
  ...
    splitter:
      type: haystack.components.preprocessors.document_splitter.DocumentSplitter
      init_parameters:
        split_by: word
        split_length: 250
        split_overlap: 30

    document_embedder:
      type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
      init_parameters:
        model: "voyage-2" # the model to use

    writer:
      type: haystack.components.writers.document_writer.DocumentWriter
      init_parameters:
        document_store:
          type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
          init_parameters:
            embedding_dim: 768
            similarity: cosine
        policy: OVERWRITE
        
connections:  # Defines how the components are connected
  ...
  - sender: splitter.documents
    receiver: document_embedder.documents
  - sender: document_embedder.documents
    receiver: writer.documents

Init Parameters

For the initialization parameters, check the init() method of the component in GitHub.

Updated 9 months ago

VoyageDocumentEmbedder

Basic Information

ℹ️
Third Party Integration

Inputs

Outputs

Overview

ℹ️
Embedding Models in Query Pipelines and Indexes

Authorization

Usage Example

Init Parameters

Basic Information

ℹ️Third Party Integration

Inputs

Outputs

Overview

ℹ️Embedding Models in Query Pipelines and Indexes

Authorization

Usage Example

Init Parameters

ℹ️
Third Party Integration

ℹ️
Embedding Models in Query Pipelines and Indexes