Models in Your Pipelines

You use models through pipeline nodes. There are a couple of pipeline components that use models. Have a look at this table for an overview of model applications and components that use them:

Model Type or Application	Node That Uses It	Description
Large language models	Generators, including ChatGenerators	Use LLMs for various NLP tasks, like generative QA, through Generators. There's a generator dedicated for each supported model provider. You can use models from providers such as OpenAI, Cohere, Azure, and more, or models hosted on AWS SageMaker and Amazon Bedrock. ChatGenerators are a type of Generators designed with multi-turn interactions between the LLM and the user in mind. They allow tool calling and are good at keeping a consistent tone and context for the LLM.
Embedding models	Embedders	Embedders use embedding models to calculate vector representations for text and documents. There are two types of embedders: document embedders, used in indexes, and text embedders, used in query pipelines to embed the query. Both types of embedders should use the same embedding models.
Question answering models	ExtractiveReader	The reader is used in extractive question answering. It highlights the answer in the document to pinpoint it and it uses transformer-based models to do that.
Ranking models	Rankers	Rankers prioritize documents based on the criteria you specify, for example, a particular value in a document's metadata field. Model-based rankers use models to embed the documents and the query and thus build a strong semantic representation of the text.

To use a model, simply pass its name as a parameter to the component (it's usually the model parameter). If you're using a proprietary model, connect deepset AI Platform to the model provider. deepset takes care of loading the models.

You can run models locally or remotely. Smaller models, like question answering or embedding models, are fast when run locally. Large models, like GPT-4, are faster to run remotely as they need optimized hardware. To run a model remotely, you may need to pass additional parameters in model kwargs. For detailed instructions, see the Pipeline Components for the component that uses the model.

When using LLMs with Generators, you can specify all additional model settings, like temperature, in the kwargs parameter, for example:

components:
  generator:
      type: haystack_integrations.components.generators.amazon_bedrock.generator.AmazonBedrockGenerator
      init_parameters:
        aws_access_key_id: {"type": "env_var", "env_vars": ["AWS_ACCESS_KEY_ID"], "strict": False}
        aws_secret_access_key: {"type": "env_var", "env_vars": ["AWS_SECRED_ACCESS_KEY"], "strict": False}
        aws_region_name: {"type": "env_var", "env_vars": ["AWS_DEFAULT_REGION"], "strict": False}
        model: "anthropic.claude-v2"
        kwargs:
          temperature: 0.0

Recommended Models

Larger models are generally more accurate at the cost of speed.

If you don't know which model to start with, you can use one of the models we recommend.

Large Language Models for RAG

This table lists the models that we recommend for retrieval augmented generation (RAG) QA. You can use them with Generators in your pipelines.

Model URL	Description	Type
Claude models by Anthropic	A transformer-based LLM that can be an alternative to the GPT models. It can generate natural language and assist with code and translations.	Proprietary
GPT-4 models by OpenAI	Large multimodal models, suitable for most tasks.	Proprietary

For an overview of the models, see Large Language Models Overview.

Hosted LLMs

Currently, on the Connections page you can set up connections with the following model providers:

Amazon Bedrock and SageMaker
Azure OpenAI
Cohere
Google AI
Hugging Face (only needed for private models hosted there)
MongoDB
NVIDIA
OpenAI
SearchAPI
Together AI
Voyage AI
Weights & Biases

Supported connections to data processing services and data sources are:

Azure AI Document Intelligence
DeepL
Snowflake
Unstructured

Reader Models for Question Answering

This table describes the models that we recommend for the question answering task. You can use them with your Readers.

Model URL	Description	Language
deepset/roberta-base-squad2-distilled	A distilled model, relatively fast and with good performance.	English
deepset/roberta-large-squad2	A large model with good performance. Slower than the distilled one.	English
deepset/xlm-roberta-base-squad2	A base model with good speed and performance.	Multilingual
deepset/tinyroberta-squad2	A very fast model.	English

You can also view state-of-the-art question answering models on the Hugging Face leaderboard.

Embedding Models

This table describes the embedding models we recommend for calculating vector representations of text. You can use them with dedicated Embedders.

Model Provider	Model Name	Description	Language
Cohere	embed-english-v2.0 embed-english-light-v2.0	See Cohere documentation.	English
Cohere	embed-multilingual-v2.0	See Cohere documentation.	Multilingual
OpenAI	text-adda-002	See OpenAI documentation.	English
Sentence Transformers	sentence-transformers/multi-qa-mpnet-base-do-v1	Vector dimension*: 768	English
Sentence Transformers	intfloat/e5-base-v2	Vector dimension*: 768	English
Sentence Transformers	intfloat/e5-large-v2	Vector dimension*: 1024 Slower than e5-base-v2 but performs better	English
Sentence Transformers	intfloat/multilingual-e5-base	Vector dimension*: 768	Multilingual
Sentence Transformers	intfloat/multilingual-e5-large	Vector dimension* 1024	Multilingual

*In language models, vector dimension refers to the number of elements a vector representing a word contains, capturing its meaning and usage. The more dimensions, or numbers, in this vector, the more detail the model can understand about the word, though it also makes the model more complex to handle. It's best to try out different models and see what works best for your data.

Ranker Models

This table lists models to use with Rankers to rank documents.

Model	Description	Language
intfloat/simlm-msmarco-reranker	The best ranker model currently available.	English
cross-encoder/ms-marco-MiniLM-L-12-v2	A slightly bigger and slower model.	English
cross-encoder/ms-marco-MiniLM-L-6-v2	Slightly faster than ms-marco-MiniLM-L-12-v2	English
svalabs/cross-electra-ms-marco-german-uncased	In our practice, this is the best model for Geman.	German
jeffwan/mmarco-mMiniLMv2-L12-H384-v1	A quite fast, multilingual model.	Multilingual
rerank-english-v2.0	A Cohere model for English data.	English
rerank-multilingual-v2.0	A Cohere model for multilingual data.	Multilingual