Large Language Models Overview

Choosing the right LLM for your task is a challenge. This overview summarizes all currently avaialable LLMs and their capabilities.

There are numerous LLMs on the market that can process and generate human-like content. They range from paid versions to open source alternatives. While all these models were trained to perform similar core tasks, their proficiency varies, making some of them better at particular tasks than others. Identifying the optimal LLM for your needs involves thorough testing and comparative analysis based on your use case. This overview is meant to give you an idea of what LLMs are available and help you choose the ones to explore further.

Tasks

All LLMs currently available on the market can perform the following tasks:

  • text generation (this includes both creative and to-the-point text)
  • code generation
  • translation
  • acting as AI assistants and chatbots

Some LLMs are better at some of these tasks than others. The differences may lie in the level of nuance the LLM can understand, the speed, or the languages it can operate in.

Additionally, the GPT-4V models can analyze image input, but this option is only available to selected users. Note, however, that deepset Cloud only runs on text. Images aren’t currently supported.

Models Within a Family

LLMs come in families. For example, GPT is a family of several models. Within a family, you can usually find a couple of models that differ in size, context window (the number of tokens a model can process), or the type of instructions it’s designed to handle (instruct and chat completion models). Larger models are usually slower but perform better. Models with larger context windows are more expensive but can process larger amounts of text.

The table below focuses on the characteristics of the whole families of models.

Instruct and Chat Completion Models

There are two types of LLMs: instruction following (usually have the word “instruct” in their name) and chat completion models. A lot of models come in both versions. Here’s a comparison:

Instruction Following ModelsChat Completion Models
Task They’re Designed ForTask-oriented, specific commands and instructions. Example: "summarize this text".Open-ended conversations, content generation. Example: "tell me what you think about XYZ"
OutputPrecise, specificCreative, exploratory, more flexible

Despite this division, we’ve found that chat completion models are also very good at following instructions. A lot of our customers use the gpt-3.5.-turbo model from OpenAI in settings that don’t involve the chat feature at all. It also works well in retrieval-augmented generation (RAG) settings.

Models Overview

📘

*Context window is given in tokens. A range of values means it differs depending on a particular model.

**Speed of the models is relative to the gpt-3.5-turbo model. It's approximate and shouldn't be treated as a definitive indicator. The more pluses, the faster the model.

***The more pluses, the more expensive the model.

ProviderModel familyDescriptionContext window* CapabilitiesSpeed** Cost***Can it be fine-tuned?
AnthropicClaude 3.5The latest release, currently only the Sonnet model (middle-sized) is available.

Data cutoff: April 2024.
200,000Has the same capabilities as Sonnet 3 with improvement in grasping nuance, humor, and generating content with a natural tone.
Faster than models in the 3.0 family.
Has vision capacities.
-+++No
AnthropicClaude 3 A family of three models: Haiku (the smallest), Sonnet (middle), and Opus (the largest).

Data cutoff: August 2023
200,000Is multilingual
Has increased capabilities in analysis and forecasting, nuanced content creation and code generation.
Faster than Claude 2 models.
Can process images, photos, charts, and graphs.
-+++No
AnthropicClaude 2Learn from its dialogues with users

Data cutoff: Dec 2022
100,000Performs well in general, open-ended conversations

Particularly well-suited to support creative or literary use cases

Multilingual, but poorer performance on low-resource languages
+++++Yes, but it’s not offered openly, you need to contact Anthropic to discuss the details.
Cohere Command R+An upgrade to Command R models, optimized for RAG and enterprise-grade workfloads.

Currently only available through Microsoft Azure.
128,000Optimized for RAG with citations.
Multilinugal.
Supports tool use.
++++++Yes
CohereCommand RCreated with production-scale workloads in mind. 128,000Built to be used in RAG scenarios.
Multilingual.
++++Yes
CohereCommandCohere’s flagship instruction-following model

Continuously improving with new releases
4,096Specifically trained for business applications, like: summarization, copywriting, extraction, and question answering++++++Yes
GoogleGemini 1.5 New generation of Gemini models in different sizes: Ultra, Pro, Flash, Nano.
Available through Google services.

Data cutoff: November 2023
Depending on the model, up to 1,048,576Complex reasoning tasks, text and code generation.

Multimodal but output only text.

Multilingual.
Varies depending on the model, with Flash declared as the fastest++No
GoogleGemini 1.0A family of models that include:
Ultra (largest), Pro (text only), Nano (fastest, smallest). Available through Google services.

Data cutoff: November 2023
Depending on the model, up to 1,048,576Multimodal, understand text, code, images, audio and video. Output text.

Multilingual.
Varies depending on the model size++Yes, but only the Pro model
GooglePALM-2Currently only available through Google apps, such as Bard AI or Vertex AI

Data cutoff: mid 2021
8,000Understands nuanced instructions
Multilingual, supports more than 100 languages
+++++Yes, but fine-tuned models are available only through Google apps, such as Sec-PALM.
Currently, you can’t fine-tune PALM models on your own.
MetaLlama 3The newest family of models in 8 and 70B sizes.

Data cutoff:
8B: March 2023
70B: December 2023
8,000English model.

Optimized for dialog use cases, helpfulness, and safety.

Accept only text input.

Can generate text and code.
++++Custom commercial licenseYes
MetaLLaMA 2Includes foundational models and models fine-tuned for chat.

Data cutoff: July 2023.
4,096English model

Tuned models intended for assistant-like chat score high on the safety and helpfulness of the output.
++++open source (research and commercial use)Yes
Mistral AIMistralA family of open-weights and commercial models available in different sizes, with Mistral Large being the flagship model.8,000-64,000Text understanding, transformation, and code generation.

Multilingual (English, French, Spanish, German, Italian)
-++
some models under open-source license
Yes
OpenAIGPT-4oA model combining text, vision, and audio modalities. Faster and cheaper than GPT-4.

Data cutoff: Oct 2023
128,000Is multimodal (accepts text and images, outputs text)
Is multilingual
Processes and generates human-like content
Handles nuanced instructions
++++In experimental access phase
OpenAIGPT-4The most advanced models by OpenAI

Data cutoff: Sept 2021, but it can access the internet
8,000-32,000Chat completion model
Processes and generates human-like content
Processes images
Handles nuanced instructions
Is multilingual
-++++In experimental access phase
OpenAIGPT-3.5The most capable models in the GPT-3.5 family

Data cutoff: Sept 2021

Less performant but faster and cheaper than GPT-4

It's being deprecated in favour of GPT-4 models
4,000-16,000Chat completion model
Processes and generates human-like content
Is multilingual
Poorer performance than GPT-4
+++++Only the gpt-3.5-turbo model.
For details, see the OpenAI blog.
Technology Innovation Institute (TII)FalconOne of the best open source models available.
The largest model in the family, Falcon 180B outperforms LlaMA 2 and is on par with PALM 2.
2,048Multilingual

Performs best in reasoning and coding tasks.
The speed varies depending on the model variant, the larger the model, the slower it is.open source
(commercial use allowed but restricted)
Yes