Language Models in deepset Cloud
deepset Cloud is model agnostic and can load models directly from model providers, such as Hugging Face or OpenAI. You can use publically available models but also your private ones if you connect deepset Cloud with the model provider.
Models in Your Pipelines
You use models through pipeline nodes. There are a couple of pipeline nodes that use models. Have a look at this table for an overview of model applications and nodes that use them:
Model Type or Application | Node That Uses It | Description |
---|---|---|
Large language models | PromptNode | Use LLMs for various NLP tasks, like generative QA, through PromptNode. You can use models from providers such as OpenAI, Cohere, Azure, and more, or models hosted on AWS SageMaker. |
Information retrieval models | Vector-based retrievers: EmbeddingRetriever and DensePassageRetriever | Retrievers act like filters that go through the documents and fetch the ones most relevant to the query. Vector-based retrievers used models to encode both the documents and the query for best results. |
Question answering models | Readers | Readers are used in extractive question answering. They highlight the answer in the document to pinpoint it and they use transformer-based models to do that. |
Ranking models | Model-based rankers: SentenceTransformersRanker and EmbeddingRanker | Rankers prioritize documents based on the criteria you specify, for example, a particular value in a document's metadata field. Model-based are powerful rankers that use transformer models to embed the documents and the query and thus build a strong semantic representation of the text. |
To use a model, you simply provide its location as a parameter to the node. If you're using a proprietary model, you can either pass the API key to the node or connect deepset Cloud to the model provider. deepset Cloud takes care of loading the models.
When using LLMs with PromptNode, you can specify all additional model settings, like temperature
, in the model_kwargs
parameter, for example:
components:
- name: PromptNode
type: PromptNode
params:
model_name_or_path: google/flan-t5-xl
model_kwargs:
temperature: 0.6
Recommended Models
Larger models are generally more accurate at the cost of speed.
If you don't know which model to start with, you can use one of the models we recommend.
Large Language Models for Generative Question Answering
This table lists the model that we recommend for generative QA. You can use them with PromptNode in your pipelines.
Model URL | Description | Type |
---|---|---|
Falcon models | Currently, the most performant open source LLMs. | Open source |
Claude models by Anthropic | A transformer-based LLM that can be an alternative to the GPT models. It can generate natural language, assist with code and translations. | Proprietary |
GPT-3.5 models by OpenAI | Faster and cheaper than GPT-4, can generate and understand natural language and code. | Proprietary |
GPT-4 models by OpenAI | Large multimodal models. More expensive and slower than GPT-3.5. | Proprietary |
Using Models Hosted on AWS SageMaker
PromptNode supports models hosted on AWS SageMaker. Contact your deepset Cloud representative to set up the model for you. Once it's ready, you'll get the model name that you then pass in the model_name_or_path
parameter of PromptNode, like this:
...
components:
- name: PromptNode
type: PromptNode
params:
model_name_or_path: <the_model_name_you_got_from_deepset_Cloud_rep>
model_kwargs:
temperature: 0.6 #these are additional model parameters that you can configure
Reader Models for Question Answering
This table describes the models that we recommend for the Question Answering task. You can use them with your Readers.
Model URL | Description | Language |
---|---|---|
deepset/roberta-base-squad2-distilled | A distilled model, relatively fast and with good performance. | English |
deepset/roberta-large-squad2 | A large model with good performance. Slower than the distilled one. | English |
deepset/xlm-roberta-base-squad2 | A base model with good speed and performance. | Multilingual |
deepset/tinyroberta-squad2 | A very fast model. | English |
You can also view state-of-the-art question answering models on the Hugging Face leaderboard.
Retriever Models for Information Retrieval
This table describes the models that we recommend for the Information Retrieval task. You can use them with your Retrievers.
Model Provider | Model Name | Description | Language |
---|---|---|---|
Cohere | embed-english-v2.0 embed-english-light-v2.0 | See Cohere documentation. | English |
Cohere | embed-multilingual-v2.0 | See Cohere documentation. | Multilingual |
OpenAI | text-adda-002 | See OpenAI documentation. | English |
Sentence Transformers | multi-qa-mpnet-base-do-v1 | Vector dimension: 768 | English |
Sentence Transformers | e5-base-v2 | Vector dimension: 768 | English |
Sentence Transformers | e5-large-v2 | Vector dimension: 1024 Slower than e5-base-v2 but performs better | English |
Sentence Transformers | multilingual-e5-base | Vector dimension: 768 | Multilingual |
Sentence Transformers | multilingual-e5-large | Vector dimension 1024 | Multilingual |
It's best to try out different models and see what works best for your data.
Ranker Models
This table lists models you can use with SentenceTransformersRanker to rank documents.
Model | Description | Language |
---|---|---|
simlm-msmarco-reranker | The best ranker model currently available. | English |
cross-encoder/ms-marco-MiniLM-L-12-v2 | A slightly bigger and slower model. | English |
cross-encoder/ms-marco-MiniLM-L-6-v2 | Slightly faster than ms-marco-MiniLM-L-12-v2 | English |
svalabs/cross-electra-ms-marco-german-uncased | In our practice, this is the best model for Geman. | German |
mmarco-mMiniLMv2-L12-H384-v1 | Multilingual |
These are the recommended models for CohereRanker:
Model | Language |
---|---|
rerank-english-v2.0 | English |
rerank-multilingual-v2.0 | Multilingual |
For EmbeddingRanker, we recommend the same models as for retrieval.
Updated 18 days ago