Skip to main content

Use OpenAI Models

Use OpenAI models in your pipelines.


About This Task

You can use OpenAI's embedding models and LLMs:

Prerequisites

You need an API key from an active OpenAI account. For details on obtaining it, see Secret keys in OpenAI.

Use OpenAI Models

First, connect deepset AI Platform to OpenAI through the Integrations page. You can set up the connection for a single workspace or for the whole organization:

Add Workspace-Level Integration

  1. Click your profile icon and choose Settings.
  2. Go to Workspace>Integrations.
  3. Find the provider you want to connect and click Connect next to them.
  4. Enter the API key and any other required details.
  5. Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

  1. Click your profile icon and choose Settings.
  2. Go to Organization>Integrations.
  3. Find the provider you want to connect and click Connect next to them.
  4. Enter the API key and any other required details.
  5. Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Then, add a component that uses an OpenAI model to your pipeline. Here are the components by the model type they use:

  • Embedding models:

    • OpenAITextEmbedder: Calculates embeddings for text, like query. Often used in query pipelines to embed a query and pass the embedding to an embedding retriever.
    • OpenAIDocumentEmbedder: Calculates embeddings for documents. Often used in indexes to embed documents and pass them to DocumentWriter.

      Embedding Models in Query Pipelines and Indexes

      The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.

      This means the embedders for your indexing and query pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.

  • LLMs:

    • OpenAIGenerator: Generates text using OpenAI models, often used in RAG pipelines.

Usage Examples

This is an example of how to use OpenAI's embedding models and an LLM in an index and a query pipeline (each in a separate tab):

components:
...
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30

document_embedder:
type: haystack.components.embedders.openai_document_embedder.OpenAIDocumentEmbedder
init_parameters:
model: text-embedding-ada-002 # the model to use

writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
embedding_dim: 768
similarity: cosine
policy: OVERWRITE

connections: # Defines how the components are connected
...
- sender: splitter.documents
receiver: document_embedder.documents
- sender: document_embedder.documents
receiver: writer.documents

Here is how to connect the components in Pipeline Builder. In the index, OpenAIDocumentEmbedder receives documents from DocumentSplitter and then passes the embedded documents to DocumentWriter, which writes them into the Document Store:

In indexes, OpenAIDocumentEmbedder receives documents from DocumentSplitter and then passes the embedded documents to DocumentWriter, which writes them into the Document Store

In a query pipeline, OpenAITextEmbedder embeds the query using the same model as the OpenAIDocumentEmbedder in the index. Then, it sends the embedded query to the retriever, which fetches matching documents and sends them to PromptBuilder. OpenAIGenerator then receives the rendered prompt from the PromptBuilder and sends the generated replies to AnswerBuilder to build a proper GeneratedAnswer object.

In the query pipeline, OpenAITextEmbedder embeds the query using the same model as the OpenAIDocumentEmbedder in the index. Then, it sends the embedded query to the retriever, which fetches matching documents and sends them to PromptBuilder. OpenAIGenerator then receives the rendered prompt from the PromptBuilder and sends the generated replies to AnswerBuilder to build a proper GeneratedAnswer object.