Large Language Models Overview
Choosing the right LLM for your task is a challenge. This overview summarizes all currently avaialable LLMs and their capabilities.
There are numerous LLMs on the market that can process and generate human-like content. They range from paid versions to open source alternatives. While all these models were trained to perform similar core tasks, their proficiency varies, making some of them better at particular tasks than others. Identifying the optimal LLM for your needs involves thorough testing and comparative analysis based on your use case. This overview is meant to give you an idea of what LLMs are available and help you choose the ones to explore further.
Tasks
All LLMs currently available on the market can perform the following tasks:
- text generation (this includes both creative and to-the-point text)
- code generation
- translation
- acting as AI assistants and chatbots
Some LLMs are better at some of these tasks than others. The differences may lie in the level of nuance the LLM can understand, the speed, or the languages it can operate in.
Additionally, the GPT-4V models can analyze image input, but this option is only available to selected users. Note, however, that deepset Cloud only runs on text. Images aren’t currently supported.
Models Within a Family
LLMs come in families. For example, GPT is a family of several models. Within a family, you can usually find a couple of models that differ in size, context window (the number of tokens a model can process), or the type of instructions it’s designed to handle (instruct and chat completion models). Larger models are usually slower but perform better. Models with larger context windows are more expensive but can process larger amounts of text.
The table below focuses on the characteristics of the whole families of models.
Instruct and Chat Completion Models
There are two types of LLMs: instruction following (usually have the word “instruct” in their name) and chat completion models. A lot of models come in both versions. Here’s a comparison:
Instruction Following Models | Chat Completion Models | |
---|---|---|
Task They’re Designed For | Task-oriented, specific commands and instructions. Example: "summarize this text". | Open-ended conversations, content generation. Example: "tell me what you think about XYZ" |
Output | Precise, specific | Creative, exploratory, more flexible |
Despite this division, we’ve found that chat completion models are also very good at following instructions. A lot of our customers use the gpt-3.5.-turbo model from OpenAI in settings that don’t involve the chat feature at all. It also works well in retrieval-augmented generation (RAG) settings.
Models Overview
*Context window is given in tokens. A range of values means it differs depending on a particular model.
**Speed of the models is relative to the gpt-3.5-turbo model. It's approximate and shouldn't be treated as a definitive indicator. The more pluses, the faster the model.
***The more pluses, the more expensive the model.
Provider | Model family | Description | Context window* | Capabilities | Speed** | Cost*** | Can it be fine-tuned? |
---|---|---|---|---|---|---|---|
OpenAI | GPT-4 | The most advanced models by OpenAI Data cutoff: Sept 2021, but it can access the internet | 8,000-32,000 | Chat completion model Processes and generates human-like content Processes images Handles nuanced instructions Is multilingual | + | ++++ | No |
OpenAI | GPT-3.5 | The most capable models in the GPT-3.5 family Data cutoff: Sept 2021 Less performant but faster and cheaper than GPT-4 | 4,000-16,000 | Chat completion model Processes and generates human-like content Is multilingual Poorer performance than GPT-4 | +++ | ++ | Only the gpt-3.5-turbo model. For details, see the OpenAI blog. |
PALM-2 | Currently only available through Google apps, such as Bard AI or Vertex AI Data cutoff: mid 2021 | 8,000 | Understands nuanced instructions Multilingual, supports more than 100 languages | +++ | ++ | Yes, but fine-tuned models are available only through Google apps, such as Sec-PALM. Currently, you can’t fine-tune PALM models on your own. | |
Anthropic | Claude 2 | Best Anthropic’s models, learns from its dialogues with users Data cutoff: Dec 2022 | 100,000 | Performs well in general, open-ended conversations Particularly well-suited to support creative or literary use cases Multilingual, but poorer performance on low-resource languages | ++ | +++ | Yes, but it’s not offered openly, you need to contact Anthropic to discuss the details. |
Cohere | Command | Cohere’s flagship instruction following model Continuously improving with new releases | 4,096 | Specifically trained for business applications, like: summarization, copywriting, extraction, and question answering | +++ | ++ | Yes |
Technology Innovation Institute (TII) | Falcon | One of the best open source models available. The largest model in the family, Falcon 180B outperforms LlaMA 2 and is on par with PALM 2. | 2.048 | Multilingual Performs best in reasoning and coding tasks. | The speed varies depending on the model variant, the larger the model, the slower it is. | open source (commercial use allowed but restricted) | Yes |
Meta | LLaMA 2 | Includes foundational models and models fine-tuned for chat. Data cutoff: July 2023. | 4,096 | English model Tuned models intended for assistant-like chat score high on the safety and helpfulness of the output. | ++++ | open source (research and commercial use) | Yes |
Updated 4 months ago