Language Models in deepset Cloud
deepset Cloud is model agnostic and can load models directly from model providers, such as Hugging Face or OpenAI. You can use publically available models but also your private ones if you connect deepset Cloud with the model provider.
Models in Your Pipelines
You use models through pipeline nodes. There are a couple of pipeline components that use models. Have a look at this table for an overview of model applications and components that use them:
Model Type or Application | Node That Uses It | Description |
---|---|---|
Large language models | Generators | Use LLMs for various NLP tasks, like generative QA, through Generators. There's a generator dedicated for each supported model provider. You can use models from providers such as OpenAI, Cohere, Azure, and more, or models hosted on AWS SageMaker and Amazon Bedrock. |
Embedding models | Embedders | Embedders use embedding models to calculate vector representations for text and documents. There are two types of embedders: document embedders, used in indexing pipelines, and text embedders, used in query pipelines to embed the query. Both types of embedders should use the same embedding models. |
Question answering models | ExtractiveReader | The reader is used in extractive question answering. It highlights the answer in the document to pinpoint it and it uses transformer-based models to do that. |
Ranking models | Rankers | Rankers prioritize documents based on the criteria you specify, for example, a particular value in a document's metadata field. Model-based rankers use models to embed the documents and the query and thus build a strong semantic representation of the text. |
To use a model, simply pass its name as a parameter to the node (it's usually the model
parameter). If you're using a proprietary model, you can either pass the API key to the node or connect deepset Cloud to the model provider. deepset Cloud takes care of loading the models.
You can run models locally or remotely. Smaller models, like question answering or embedding models, are fast when run locally. Large models, like GPT-3.5 or GPT-4, are faster to run remotely as they need optimized hardware. To run a model remotely, you may need to pass additional parameters in model kwargs. For detailed instructions, see the Pipeline Components for the component that uses the model.
When using LLMs with Generators, you can specify all additional model settings, like temperature
, in the kwargs
parameter, for example:
components:
generator:
type: haystack_integrations.components.generators.amazon_bedrock.generator.AmazonBedrockGenerator
init_parameters:
aws_access_key_id: {"type": "env_var", "env_vars": ["AWS_ACCESS_KEY_ID"], "strict": False}
aws_secret_access_key: {"type": "env_var", "env_vars": ["AWS_SECRED_ACCESS_KEY"], "strict": False}
aws_region_name: {"type": "env_var", "env_vars": ["AWS_DEFAULT_REGION"], "strict": False}
model: "anthropic.claude-v2"
kwargs:
temperature: 0.0
Recommended Models
Larger models are generally more accurate at the cost of speed.
If you don't know which model to start with, you can use one of the models we recommend.
Large Language Models for RAG
This table lists the models that we recommend for retrieval augmented generation (RAG) QA. You can use them with Generators in your pipelines.
Model URL | Description | Type |
---|---|---|
Claude models by Anthropic | A transformer-based LLM that can be an alternative to the GPT models. It can generate natural language and assist with code and translations. | Proprietary |
GPT-3.5 models by OpenAI | Faster and cheaper than GPT-4, it can generate and understand natural language and code. | Proprietary |
GPT-4 models by OpenAI | Large multimodal models. More expensive and slower than GPT-3.5. | Proprietary |
For an overview of the models, see Large Language Models Overview.
Hosted LLMs
Currently, on the Connections page you can set up connections with the following model providers:
- Amazon Bedrock and SageMaker
- Azure OpenAI
- Cohere
- Google AI
- Hugging Face (only needed for private models hosted there)
- Nvidia
- OpenAI
- SearchAPI
- Voyage AI
Supported connections to data processing services and data sources are:
- Azure Document Intelligence
- DeepL
- Snowflake
- unstructured.io
Reader Models for Question Answering
This table describes the models that we recommend for the question answering task. You can use them with your Readers.
Model URL | Description | Language |
---|---|---|
deepset/roberta-base-squad2-distilled | A distilled model, relatively fast and with good performance. | English |
deepset/roberta-large-squad2 | A large model with good performance. Slower than the distilled one. | English |
deepset/xlm-roberta-base-squad2 | A base model with good speed and performance. | Multilingual |
deepset/tinyroberta-squad2 | A very fast model. | English |
You can also view state-of-the-art question answering models on the Hugging Face leaderboard.
Embedding Models
This table describes the embedding models we recommend for calculating vector representations of text. You can use them with dedicated Embedders.
Model Provider | Model Name | Description | Language |
---|---|---|---|
Cohere | embed-english-v2.0 embed-english-light-v2.0 | See Cohere documentation. | English |
Cohere | embed-multilingual-v2.0 | See Cohere documentation. | Multilingual |
OpenAI | text-adda-002 | See OpenAI documentation. | English |
Sentence Transformers | sentence-transformers/multi-qa-mpnet-base-do-v1 | Vector dimension*: 768 | English |
Sentence Transformers | intfloat/e5-base-v2 | Vector dimension*: 768 | English |
Sentence Transformers | intfloat/e5-large-v2 | Vector dimension*: 1024 Slower than e5-base-v2 but performs better | English |
Sentence Transformers | intfloat/multilingual-e5-base | Vector dimension*: 768 | Multilingual |
Sentence Transformers | intfloat/multilingual-e5-large | Vector dimension* 1024 | Multilingual |
*In language models, vector dimension refers to the number of elements a vector representing a word contains, capturing its meaning and usage. The more dimensions, or numbers, in this vector, the more detail the model can understand about the word, though it also makes the model more complex to handle. It's best to try out different models and see what works best for your data.
Ranker Models
This table lists models to use with Rankers to rank documents.
Model | Description | Language |
---|---|---|
intfloat/simlm-msmarco-reranker | The best ranker model currently available. | English |
cross-encoder/ms-marco-MiniLM-L-12-v2 | A slightly bigger and slower model. | English |
cross-encoder/ms-marco-MiniLM-L-6-v2 | Slightly faster than ms-marco-MiniLM-L-12-v2 | English |
svalabs/cross-electra-ms-marco-german-uncased | In our practice, this is the best model for Geman. | German |
jeffwan/mmarco-mMiniLMv2-L12-H384-v1 | A quite fast, multilingual model. | Multilingual |
rerank-english-v2.0 | A Cohere model for English data. | English |
rerank-multilingual-v2.0 | A Cohere model for multilingual data. | Multilingual |
Updated 5 months ago