Models in Your Pipelines

You use models through pipeline nodes. There are a couple of pipeline nodes that use models. Have a look at this table for an overview of model applications and nodes that use them:

Model Type or Application	Node That Uses It	Description
Large language models	PromptNode	Use LLMs for various NLP tasks, like generative QA, through PromptNode. You can use models from providers such as OpenAI, Cohere, Azure, and more, or models hosted on AWS SageMaker and Amazon Bedrock.
Information retrieval models	Vector-based retrievers: EmbeddingRetriever and DensePassageRetriever	Retrievers act like filters that go through the documents and fetch the ones most relevant to the query. Vector-based retrievers used models to encode both the documents and the query for the best results.
Question answering models	Readers	Readers are used in extractive question answering. They highlight the answer in the document to pinpoint it and they use transformer-based models to do that.
Ranking models	Model-based rankers: SentenceTransformersRanker and EmbeddingRanker	Rankers prioritize documents based on the criteria you specify, for example, a particular value in a document's metadata field. Model-based are powerful rankers that use transformer models to embed the documents and the query and thus build a strong semantic representation of the text.

To use a model, simply pass its name as a parameter to the node (it's usually the model_name_or_path parameter). If you're using a proprietary model, you can either pass the API key to the node or connect deepset Cloud to the model provider. deepset Cloud takes care of loading the models.

You can run models locally or remotely. Smaller models, like Reader or Retriever models, are fast when run locally. Large models, like GPT-3.5 or GPT-4, are faster to run remotely as they need optimized hardware. To run a model remotely, you may need to pass additional parameters in model kwargs. For detailed instructions, see the documentation for the node that uses the model.

When using LLMs with PromptNode, you can specify all additional model settings, like temperature, in the model_kwargs parameter, for example:

components:
  - name: PromptNode 
    type: PromptNode
    params:
      model_name_or_path: google/flan-t5-xl
      model_kwargs:
      	temperature: 0.6

Recommended Models

Larger models are generally more accurate at the cost of speed.

If you don't know which model to start with, you can use one of the models we recommend.

Large Language Models for RAG

This table lists the model that we recommend for retrieval augmented generation (RAG) QA. You can use them with PromptNode in your pipelines. You can use models hosted on AWS Bedrock, SageMaker, and Hugging Face, as well as local models.

Model URL	Description	Type
Llama 2 models, specifically Llama2-70b-chat-v1	Currently, the most performant open source LLMs.	Open source
Claude models by Anthropic	A transformer-based LLM that can be an alternative to the GPT models. It can generate natural language and assist with code and translations.	Proprietary
GPT-3.5 models by OpenAI	Faster and cheaper than GPT-4, it can generate and understand natural language and code.	Proprietary
GPT-4 models by OpenAI	Large multimodal models. More expensive and slower than GPT-3.5.	Proprietary

For an overview of the models, see Large Language Models Overview.

Hosted LLMs

deepset Cloud can load models hosted in:

Hugging Face
Cohere
OpenAI
Amazon Sagemaker
Amazon Bedrock
Microsoft Azure (hosted OpenAI models)

For instructions on how to set up the connections, see Using Hosted LLMs in Your Pipelines.

Reader Models for Question Answering

This table describes the models that we recommend for the question answering task. You can use them with your Readers.

Model URL	Description	Language
deepset/roberta-base-squad2-distilled	A distilled model, relatively fast and with good performance.	English
deepset/roberta-large-squad2	A large model with good performance. Slower than the distilled one.	English
deepset/xlm-roberta-base-squad2	A base model with good speed and performance.	Multilingual
deepset/tinyroberta-squad2	A very fast model.	English

You can also view state-of-the-art question answering models on the Hugging Face leaderboard.

Retriever Models for Information Retrieval

This table describes the embedding models that we recommend for the information retrieval (document search) task. You can use them with your Retrievers.

Model Provider	Model Name	Description	Language
Cohere	embed-english-v2.0 embed-english-light-v2.0	See Cohere documentation.	English
Cohere	embed-multilingual-v2.0	See Cohere documentation.	Multilingual
OpenAI	text-adda-002	See OpenAI documentation.	English
Sentence Transformers	sentence-transformers/multi-qa-mpnet-base-do-v1	Vector dimension*: 768	English
Sentence Transformers	intfloat/e5-base-v2	Vector dimension*: 768 With this model, we recommend setting the `similarity` parameter of DocumentStore to `cosine`.	English
Sentence Transformers	intfloat/e5-large-v2	Vector dimension*: 1024 Slower than e5-base-v2 but performs better With this model, we recommend setting the `similarity` parameter of DocumentStore to `cosine`.	English
Sentence Transformers	intfloat/multilingual-e5-base	Vector dimension*: 768 With this model, we recommend setting the `similarity` parameter of DocumentStore to `cosine`.	Multilingual
Sentence Transformers	intfloat/multilingual-e5-large	Vector dimension* 1024 With this model, we recommend setting the `similarity` parameter of DocumentStore to `cosine`.	Multilingual

*In language models, vector dimension refers to the number of elements a vector that represents a word contains, capturing its meaning and usage. The more dimensions, or numbers, in this vector, the more detail the model can understand about the word, though it also makes the model more complex to handle.
For models with the vector dimension provided, set the embedding_dim parameter of DocumentStore to the model's vector dimension like this:

...
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore 
    params:
      embedding_dim: 1024 #Here you specify the vector dimension of the Retriever model
  - name: Retriever 
    type: EmbeddingRetriever 
    params:
      document_store: DocumentStore
      embedding_model: intfloat/e5-large-v2 #This model's vector dimension is 1024, as set in embedding_dim of DocumentStore
      top_k: 20 # The number of results to return

It's best to try out different models and see what works best for your data.

Ranker Models

This table lists models you can use with SentenceTransformersRanker to rank documents.

Model	Description	Language
intfloat/simlm-msmarco-reranker	The best ranker model currently available.	English
cross-encoder/ms-marco-MiniLM-L-12-v2	A slightly bigger and slower model.	English
cross-encoder/ms-marco-MiniLM-L-6-v2	Slightly faster than ms-marco-MiniLM-L-12-v2	English
svalabs/cross-electra-ms-marco-german-uncased	In our practice, this is the best model for Geman.	German
jeffwan/mmarco-mMiniLMv2-L12-H384-v1	A quite fast, multilingual model.	Multilingual

These are the recommended models for CohereRanker:

Model	Language
rerank-english-v2.0	English
rerank-multilingual-v2.0	Multilingual

For EmbeddingRanker, we recommend the same models as for retrieval.