VoyageDocumentEmbedder
Compute document embeddings using models by Voyage. The resulting embeddings are stored in the embedding
metadata field of each document.
Basic Information
Third Party Integration
Voyage AI is a third party integration developed by an external provider and is not maintained by deepset. While we encourage you to explore it, we recommend reviewing it carefully to ensure it meets your needs.
- Pipeline type: Indexing
- Type:
haystack_integrations.components.embedders.voyage_embedders.VoyageDocumentEmbedder
- Components it typically connects with:
- DocumentWriter:
VoyageDocumentEmbedder
can send the converted documents toDocumentWriter
which writes them into a document store. - PreProcessors:
VoyageDocumentEmbedder
can receive documents from a PreProcessor, such as DocumentSplitter.
- DocumentWriter:
Inputs
Name | Type | Description |
---|---|---|
documents | List of Document objects | The documents to embed. |
Outputs
Name | Type | Description |
---|---|---|
documents | List of Document objects | Documents with the calculated embeddings stored in their embedding metadata field. |
meta | Dictionary | Information about the usage of the model. |
Overview
VoyageDocumentEmbedder
calculates vector representations for documents. It's used in indexing pipelines after PreProcessors
and before DocumentWriter
. VoyageDocumentEmbedder
is an external integration, which means it was added by a Haystack community member and is maintained by them. For most recent information about this component, check the GitHub repository.
If you use VoyageDocumentEmbedder
in your indexing pipeline, you must use VoyageTextEmbedder
to embed the query in your query pipeline, with the same model.
Embedding Models in Query and Indexing Pipelines
The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.
This means the embedders for your indexing and query pipelines must match. For example, if you use
CohereDocumentEmbedder
to embed your documents, you should useCohereTextEmbedder
with the same model to embed your queries.
Authorization
For details, check Use Voyage AI Models.
Usage Example
In this example, VoyageDocumentEmbedder
receives documents from DocumentSplitter
and then sends the output documents to DocumentWriter
.
And here is the YAML configuration:
components:
...
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
document_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
init_parameters:
model: "voyage-2" # the model to use
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
similarity: cosine
policy: OVERWRITE
connections: # Defines how the components are connected
...
- sender: splitter.documents
receiver: document_embedder.documents
- sender: document_embedder.documents
receiver: writer.documents
Init Parameters
For the initialization parameters, check the init()
method of the component in GitHub.
Updated about 2 months ago