Language Models in deepset Cloud

deepset Cloud is model agnostic and can load models directly from model providers, such as Hugging Face or OpenAI. You can use publically available models but also your private ones if you connect deepset Cloud with the model provider.

Models in Your Pipelines

You use models through pipeline nodes. There are a couple of pipeline nodes that use models. Have a look at this table for an overview of model applications and nodes that use them:

Model Type or ApplicationNode That Uses ItDescription
Large language modelsPromptNodeUse LLMs for various NLP tasks, like generative QA, through PromptNode. You can use models from providers such as OpenAI, Cohere, Azure, and more, or models hosted on AWS SageMaker.
Information retrieval modelsVector-based retrievers: EmbeddingRetriever and DensePassageRetrieverRetrievers act like filters that go through the documents and fetch the ones most relevant to the query. Vector-based retrievers used models to encode both the documents and the query for best results.
Question answering modelsReadersReaders are used in extractive question answering. They highlight the answer in the document to pinpoint it and they use transformer-based models to do that.
Ranking modelsModel-based rankers: SentenceTransformersRanker and EmbeddingRankerRankers prioritize documents based on the criteria you specify, for example, a particular value in a document's metadata field. Model-based are powerful rankers that use transformer models to embed the documents and the query and thus build a strong semantic representation of the text.

To use a model, you simply provide its location as a parameter to the node. If you're using a proprietary model, you can either pass the API key to the node or connect deepset Cloud to the model provider. deepset Cloud takes care of loading the models.

When using LLMs with PromptNode, you can specify all additional model settings, like temperature, in the model_kwargs parameter, for example:

	- name: PromptNode 
    type: PromptNode
      model_name_or_path: google/flan-t5-xl
      	temperature: 0.6

Recommended Models

Larger models are generally more accurate at the cost of speed.

If you don't know which model to start with, you can use one of the models we recommend.

Large Language Models for Generative Question Answering

This table lists the model that we recommend for generative QA. You can use them with PromptNode in your pipelines.

Model URLDescriptionType
Falcon modelsCurrently, the most performant open source LLMs.Open source
Claude models by AnthropicA transformer-based LLM that can be an alternative to the GPT models. It can generate natural language, assist with code and translations.Proprietary
GPT-3.5 models by OpenAIFaster and cheaper than GPT-4, can generate and understand natural language and code.Proprietary
GPT-4 models by OpenAILarge multimodal models. More expensive and slower than GPT-3.5.Proprietary

Using Models Hosted on AWS SageMaker

PromptNode supports models hosted on AWS SageMaker. Contact your deepset Cloud representative to set up the model for you. Once it's ready, you'll get the model name that you then pass in the model_name_or_path parameter of PromptNode, like this:

	- name: PromptNode 
    type: PromptNode
      model_name_or_path: <the_model_name_you_got_from_deepset_Cloud_rep>
      	temperature: 0.6 #these are additional model parameters that you can configure

Reader Models for Question Answering

This table describes the models that we recommend for the Question Answering task. You can use them with your Readers.

Model URLDescriptionLanguage
deepset/roberta-base-squad2-distilledA distilled model, relatively fast and with good performance.English
deepset/roberta-large-squad2A large model with good performance. Slower than the distilled one.English
deepset/xlm-roberta-base-squad2A base model with good speed and performance.Multilingual
deepset/tinyroberta-squad2A very fast model.English

You can also view state-of-the-art question answering models on the Hugging Face leaderboard.

Retriever Models for Information Retrieval

This table describes the models that we recommend for the Information Retrieval task. You can use them with your Retrievers.

Model ProviderModel NameDescriptionLanguage
See Cohere documentation.English
Cohereembed-multilingual-v2.0See Cohere documentation.Multilingual
OpenAItext-adda-002See OpenAI documentation.English
Sentence Transformersmulti-qa-mpnet-base-do-v1Vector dimension: 768English
Sentence Transformerse5-base-v2Vector dimension: 768English
Sentence Transformerse5-large-v2Vector dimension: 1024
Slower than e5-base-v2 but performs better
Sentence Transformersmultilingual-e5-baseVector dimension: 768Multilingual
Sentence Transformersmultilingual-e5-largeVector dimension 1024Multilingual

It's best to try out different models and see what works best for your data.

Ranker Models

This table lists models you can use with SentenceTransformersRanker to rank documents.

simlm-msmarco-rerankerThe best ranker model currently available.English
cross-encoder/ms-marco-MiniLM-L-12-v2A slightly bigger and slower model.English
cross-encoder/ms-marco-MiniLM-L-6-v2Slightly faster than ms-marco-MiniLM-L-12-v2English
svalabs/cross-electra-ms-marco-german-uncasedIn our practice, this is the best model for Geman.German

These are the recommended models for CohereRanker:


For EmbeddingRanker, we recommend the same models as for retrieval.