VoyageDocumentEmbedder
Compute document embeddings using models by Voyage. The resulting embeddings are stored in the embedding metadata field of each document.
Basic Information
Third Party IntegrationVoyage AI is a third party integration developed by an external provider and is not maintained by deepset. While we encourage you to explore it, we recommend reviewing it carefully to ensure it meets your needs.
- Pipeline type: Indexing
- Type:
haystack_integrations.components.embedders.voyage_embedders.VoyageDocumentEmbedder - Components it typically connects with:
- DocumentWriter:
VoyageDocumentEmbeddercan send the converted documents toDocumentWriterwhich writes them into a document store. - PreProcessors:
VoyageDocumentEmbeddercan receive documents from a PreProcessor, such as DocumentSplitter.
- DocumentWriter:
Inputs
| Name | Type | Description |
|---|---|---|
documents | List of Document objects | The documents to embed. |
Outputs
| Name | Type | Description |
|---|---|---|
documents | List of Document objects | Documents with the calculated embeddings stored in their embedding metadata field. |
meta | Dictionary | Information about the usage of the model. |
Overview
VoyageDocumentEmbedder calculates vector representations for documents. It's used in indexing pipelines after PreProcessors and before DocumentWriter. VoyageDocumentEmbedder is an external integration, which means it was added by a Haystack community member and is maintained by them. For most recent information about this component, check the GitHub repository.
If you use VoyageDocumentEmbedder in your indexing pipeline, you must use VoyageTextEmbedder to embed the query in your query pipeline, with the same model.
Embedding Models in Query Pipelines and IndexesThe embedding model you use to embed documents in your index must be the same as the embedding model you use to embed the query in your pipeline.
This means the embedders for your indexes and pipelines must match. For example, if you use
CohereDocumentEmbedderto embed your documents, you should useCohereTextEmbedderwith the same model to embed your queries.
Authorization
For details, check Use Voyage AI Models.
Usage Example
In this example, VoyageDocumentEmbedder receives documents from DocumentSplitter and then sends the output documents to DocumentWriter.
And here is the YAML configuration:
components:
...
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
document_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
init_parameters:
model: "voyage-2" # the model to use
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
similarity: cosine
policy: OVERWRITE
connections: # Defines how the components are connected
...
- sender: splitter.documents
receiver: document_embedder.documents
- sender: document_embedder.documents
receiver: writer.documentsInit Parameters
For the initialization parameters, check the init() method of the component in GitHub.
Updated about 1 month ago