Large Language Models Overview
Choosing the right LLM for your task is a challenge. This overview summarizes all currently avaialable LLMs and their capabilities.
There are numerous LLMs on the market that can process and generate human-like content. They range from paid versions to open source alternatives. While all these models were trained to perform similar core tasks, their proficiency varies, making some of them better at particular tasks than others. Identifying the optimal LLM for your needs involves thorough testing and comparative analysis based on your use case. This overview is meant to give you an idea of what LLMs are available and help you choose the ones to explore further.
Tasks
All LLMs currently available on the market can perform the following tasks:
- text generation (this includes both creative and to-the-point text)
- code generation
- translation
- acting as AI assistants and chatbots
Some LLMs are better at some of these tasks than others. The differences may lie in the level of nuance the LLM can understand, the speed, or the languages it can operate in.
Additionally, the GPT-4V models can analyze image input, but this option is only available to selected users. Note, however, that deepset Cloud only runs on text. Images aren’t currently supported.
Models Within a Family
LLMs come in families. For example, GPT is a family of several models. Within a family, you can usually find a couple of models that differ in size, context window (the number of tokens a model can process), or the type of instructions it’s designed to handle (instruct and chat completion models). Larger models are usually slower but perform better. Models with larger context windows are more expensive but can process larger amounts of text.
The table below focuses on the characteristics of the whole families of models.
Instruct and Chat Completion Models
There are two types of LLMs: instruction following (usually have the word “instruct” in their name) and chat completion models. A lot of models come in both versions. Here’s a comparison:
Instruction Following Models | Chat Completion Models | |
---|---|---|
Task They’re Designed For | Task-oriented, specific commands and instructions. Example: "summarize this text". | Open-ended conversations, content generation. Example: "tell me what you think about XYZ" |
Output | Precise, specific | Creative, exploratory, more flexible |
Despite this division, we’ve found that chat completion models are also very good at following instructions. A lot of our customers use the gpt-3.5.-turbo model from OpenAI in settings that don’t involve the chat feature at all. It also works well in retrieval-augmented generation (RAG) settings.
Models Overview
*Context window is given in tokens. A range of values means it differs depending on a particular model.
**Speed of the models is relative to the gpt-3.5-turbo model. It's approximate and shouldn't be treated as a definitive indicator. The more pluses, the faster the model.
***The more pluses, the more expensive the model.
Provider | Model family | Description | Context window* | Capabilities | Speed** | Cost*** | Can it be fine-tuned? |
---|---|---|---|---|---|---|---|
Anthropic | Claude 3.5 | The latest release, currently only the Sonnet model (middle-sized) is available. Data cutoff: April 2024. | 200,000 | Has the same capabilities as Sonnet 3 with improvement in grasping nuance, humor, and generating content with a natural tone. Faster than models in the 3.0 family. Has vision capacities. | - | +++ | No |
Anthropic | Claude 3 | A family of three models: Haiku (the smallest), Sonnet (middle), and Opus (the largest). Data cutoff: August 2023 | 200,000 | Is multilingual Has increased capabilities in analysis and forecasting, nuanced content creation and code generation. Faster than Claude 2 models. Can process images, photos, charts, and graphs. | - | +++ | No |
Anthropic | Claude 2 | Learn from its dialogues with users Data cutoff: Dec 2022 | 100,000 | Performs well in general, open-ended conversations Particularly well-suited to support creative or literary use cases Multilingual, but poorer performance on low-resource languages | ++ | +++ | Yes, but it’s not offered openly, you need to contact Anthropic to discuss the details. |
Cohere | Command R+ | An upgrade to Command R models, optimized for RAG and enterprise-grade workfloads. Currently only available through Microsoft Azure. | 128,000 | Optimized for RAG with citations. Multilinugal. Supports tool use. | +++ | +++ | Yes |
Cohere | Command R | Created with production-scale workloads in mind. | 128,000 | Built to be used in RAG scenarios. Multilingual. | ++ | ++ | Yes |
Cohere | Command | Cohere’s flagship instruction-following model Continuously improving with new releases | 4,096 | Specifically trained for business applications, like: summarization, copywriting, extraction, and question answering | +++ | +++ | Yes |
Gemini 1.5 | New generation of Gemini models in different sizes: Ultra, Pro, Flash, Nano. Available through Google services. Data cutoff: November 2023 | Depending on the model, up to 1,048,576 | Complex reasoning tasks, text and code generation. Multimodal but output only text. Multilingual. | Varies depending on the model, with Flash declared as the fastest | ++ | No | |
Gemini 1.0 | A family of models that include: Ultra (largest), Pro (text only), Nano (fastest, smallest). Available through Google services. Data cutoff: November 2023 | Depending on the model, up to 1,048,576 | Multimodal, understand text, code, images, audio and video. Output text. Multilingual. | Varies depending on the model size | ++ | Yes, but only the Pro model | |
PALM-2 | Currently only available through Google apps, such as Bard AI or Vertex AI Data cutoff: mid 2021 | 8,000 | Understands nuanced instructions Multilingual, supports more than 100 languages | +++ | ++ | Yes, but fine-tuned models are available only through Google apps, such as Sec-PALM. Currently, you can’t fine-tune PALM models on your own. | |
Meta | Llama 3 | The newest family of models in 8 and 70B sizes. Data cutoff: 8B: March 2023 70B: December 2023 | 8,000 | English model. Optimized for dialog use cases, helpfulness, and safety. Accept only text input. Can generate text and code. | ++++ | Custom commercial license | Yes |
Meta | LLaMA 2 | Includes foundational models and models fine-tuned for chat. Data cutoff: July 2023. | 4,096 | English model Tuned models intended for assistant-like chat score high on the safety and helpfulness of the output. | ++++ | open source (research and commercial use) | Yes |
Mistral AI | Mistral | A family of open-weights and commercial models available in different sizes, with Mistral Large being the flagship model. | 8,000-64,000 | Text understanding, transformation, and code generation. Multilingual (English, French, Spanish, German, Italian) | - | ++ some models under open-source license | Yes |
OpenAI | GPT-4o | A model combining text, vision, and audio modalities. Faster and cheaper than GPT-4. Data cutoff: Oct 2023 | 128,000 | Is multimodal (accepts text and images, outputs text) Is multilingual Processes and generates human-like content Handles nuanced instructions | ++ | ++ | In experimental access phase |
OpenAI | GPT-4 | The most advanced models by OpenAI Data cutoff: Sept 2021, but it can access the internet | 8,000-32,000 | Chat completion model Processes and generates human-like content Processes images Handles nuanced instructions Is multilingual | - | ++++ | In experimental access phase |
OpenAI | GPT-3.5 | The most capable models in the GPT-3.5 family Data cutoff: Sept 2021 Less performant but faster and cheaper than GPT-4 It's being deprecated in favour of GPT-4 models | 4,000-16,000 | Chat completion model Processes and generates human-like content Is multilingual Poorer performance than GPT-4 | +++ | ++ | Only the gpt-3.5-turbo model. For details, see the OpenAI blog. |
Technology Innovation Institute (TII) | Falcon | One of the best open source models available. The largest model in the family, Falcon 180B outperforms LlaMA 2 and is on par with PALM 2. | 2,048 | Multilingual Performs best in reasoning and coding tasks. | The speed varies depending on the model variant, the larger the model, the slower it is. | open source (commercial use allowed but restricted) | Yes |
Updated 28 days ago