Large Language Models Overview

Choosing the right LLM for your task is a challenge. This overview summarizes all currently avaialable LLMs and their capabilities.

There are numerous LLMs on the market that can process and generate human-like content. They range from paid versions to open source alternatives. While all these models were trained to perform similar core tasks, their proficiency varies, making some of them better at particular tasks than others. Identifying the optimal LLM for your needs involves thorough testing and comparative analysis based on your use case. This overview is meant to give you an idea of what LLMs are available and help you choose the ones to explore further.

Tasks

All LLMs currently available on the market can perform the following tasks:

  • text generation (this includes both creative and to-the-point text)
  • code generation
  • translation
  • acting as AI assistants and chatbots

Some LLMs are better at some of these tasks than others. The differences may lie in the level of nuance the LLM can understand, the speed, or the languages it can operate in.

Additionally, the GPT-4V models can analyze image input, but this option is only available to selected users. Note, however, that deepset Cloud only runs on text. Images aren’t currently supported.

Models Within a Family

LLMs come in families. For example, GPT is a family of several models. Within a family, you can usually find a couple of models that differ in size, context window (the number of tokens a model can process), or the type of instructions it’s designed to handle (instruct and chat completion models). Larger models are usually slower but perform better. Models with larger context windows are more expensive but can process larger amounts of text.

The table below focuses on the characteristics of the whole families of models.

Instruct and Chat Completion Models

There are two types of LLMs: instruction following (usually have the word “instruct” in their name) and chat completion models. A lot of models come in both versions. Here’s a comparison:

Instruction Following ModelsChat Completion Models
Task They’re Designed ForTask-oriented, specific commands and instructions. Example: "summarize this text".Open-ended conversations, content generation. Example: "tell me what you think about XYZ"
OutputPrecise, specificCreative, exploratory, more flexible

Despite this division, we’ve found that chat completion models are also very good at following instructions. A lot of our customers use the gpt-3.5.-turbo model from OpenAI in settings that don’t involve the chat feature at all. It also works well in retrieval-augmented generation (RAG) settings.

Models Overview

📘

*Context window is given in tokens. A range of values means it differs depending on a particular model.

**Speed of the models is relative to the gpt-3.5-turbo model. It's approximate and shouldn't be treated as a definitive indicator. The more pluses, the faster the model.

***The more pluses, the more expensive the model.

ProviderModel familyDescriptionContext window* CapabilitiesSpeed** Cost***Can it be fine-tuned?
OpenAIGPT-4The most advanced models by OpenAI

Data cutoff: Sept 2021
8,000-32,000Chat completion model
Processes and generates human-like content
Processes images
Handles nuanced instructions
Is multilingual
+++++No
OpenAIGPT-3.5The most capable models in the GPT-3.5 family

Data cutoff: Sept 2021

Less performant but faster and cheaper than GPT-4
4,000-16,000Chat completion model
Processes and generates human-like content
Is multilingual
Poorer performance than GPT-4
+++++Only the gpt-3.5-turbo model.
For details, see the OpenAI blog.
GooglePALM-2Currently only available through Google apps, such as Bard AI or Vertex AI

Data cutoff: mid 2021
8,000Understands nuanced instructions
Multilingual, supports more than 100 languages
+++++Yes, but fine-tuned models are available only through Google apps, such as Sec-PALM.
Currently, you can’t fine-tune PALM models on your own.
AnthropicClaude 2Best Anthropic’s models, learns from its dialogues with users

Data cutoff: Dec 2022
100,000Performs well in general, open-ended conversations

Particularly well-suited to support creative or literary use cases

Multilingual, but poorer performance on low-resource languages
+++++Yes, but it’s not offered openly, you need to contact Anthropic to discuss the details.
CohereCommandCohere’s flagship instruction following model

Continuously improving with new releases
4,096Specifically trained for business applications, like: summarization, copywriting, extraction, and question answering+++++Yes
Technology Innovation Institute (TII)FalconOne of the best open source models available.
The largest model in the family, Falcon 180B outperforms LlaMA 2 and is on par with PALM 2.
2.048Multilingual

Performs best in reasoning and coding tasks.
The speed varies depending on the model variant, the larger the model, the slower it is.open source
(commercial use allowed but restricted)
Yes
MetaLLaMA 2Includes foundational models and models fine-tuned for chat.

Data cutoff: July 2023.
4,096English model

Tuned models intended for assistant-like chat score high on the safety and helpfulness of the output.
++++open source (research and commercial use)Yes