Use Voyage AI Models
Create embeddings using top-performing embedding models by Voyage AI.
Third Party Integration
Voyage AI is a third party integration developed by an external provider and is not maintained by deepset. While we encourage you to explore it, we recommend reviewing it carefully to ensure it meets your needs.
Use Voyage AI models to calculate embeddings for documents and queries in your pipelines. For available models, see Voyage AI documentation.
Prerequisites
You need an API key from Voyage AI. For details, see the Voyage website.
Use Voyage Models
First, connect deepset Cloud to Voyage AI through the Connections page:
-
Click your initials in the top right corner and select Connections.
-
Click Connect next to the provider.
-
Enter your user access token and submit it.
Then, add a component that uses a Voyage model. There are two components available:
- VoyageTextEmbedder: Embeds text strings. You can use it in a query pipeline to embed the query and pass it to an embedding retriever.
- VoyageDocumentEmbedder: Embeds documents. You can use it in indexing pipelines to calculate embeddings for documents.
Embedding Models in Query and Indexing Pipelines
The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.
This means the embedders for your indexing and query pipelines must match. For example, if you use
CohereDocumentEmbedder
to embed your documents, you should useCohereTextEmbedder
with the same model to embed your queries.
Usage Examples
This is an example of an indexing and a query pipeline (each in a separate tab) that uses Voyage models to embed text (query pipeline) and documents (indexing pipeline):
components:
...
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
document_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_document_embedder.VoyageDocumentEmbedder
init_parameters:
model: "voyage-2" # the model to use
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
similarity: cosine
policy: OVERWRITE
connections: # Defines how the components are connected
...
- sender: splitter.documents
receiver: document_embedder.documents
- sender: document_embedder.documents
receiver: writer.documents
components:
...
query_embedder:
type: haystack_integrations.components.embedders.voyage_embedders.voyage_text_embedder.VoyageTextEmbedder
init_parameters:
model: "voyage-2" # the model to use
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
init_parameters:
use_ssl: True
verify_certs: False
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
These are the documents:
{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
generator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: "gpt-3.5-turbo" # the model to use
generation_kwargs: # additional parameters for the model
max_tokens: 400
temperature: 0.0
seed: 0
answer_builder:
init_parameters: {}
type: haystack.components.builders.answer_builder.AnswerBuilder
...
connections: # Defines how the components are connected
...
- sender: query_embedder.embedding # AmazonBedrockTextEmbedder sends the embedded query to the retriever
receiver: retriever.query_embedding
- sender: retriever.documents
receiver: prompt_builder.documents
- sender: prompt_builder.prompt
receiver: generator.prompt
- sender: generator.replies
receiver: answer_builder.replies
...
inputs:
query:
..
- "query_embedder.text" # TextEmbedder needs query as input and it's not getting it
- "retriever.query" # from any component it's connected to, so it needs to receive it from the pipeline.
- "prompt_builder.question"
- "answer_builder.query"
...
...
This is how to connect the components in Pipeline Builder. In the indexing pipeline, VoyageDocumentEmbedder embeds documents from DocumentSplitter and sends them to DocumentWriter that writes them into the document store where the query pipeline can access them:
In the query pipeline, VoyageTextEmbedder embeds the query using the same model as VoyageDocumentEmbedder in the indexing pipeline. It then sends the embedded query to the Retriever:
Updated about 2 months ago