Language models are neural networks trained on large amounts of text to learn how a language works. If you think about the way you're using your native tongue, you'll observe that it's intuitive. You don't have to think about grammar rules when speaking, you just know and apply them intuitively. A language model is an attempt to enable machines to use the language in a similar way humans do.
Language models are no magic. The way a model works is that it decides how probable it is that a word is valid in a particular word sequence. By valid, we mean that it resembles the way humans talk, not that it complies with any grammar rules.
The latest advance in NLP is the transformer architecture which led to top-performing models, such as BERT. It differs from previously used architectures in that instead of processing passages of text word-by-word, transformer-based models can process whole sentences at once, which makes them much faster in training but also in inference. They can also learn the relations between different words in a sentence and how these relations affect the expected model output—a functionality called attention mechanism.
A fine-tuned model is a language model that was trained on a large amount of raw text to perform a specific NLP task, such as text classification or question answering. A lot of fine-tuned models are publicly available for free on model hubs such as Hugging Face. This means that you can just grab them from there and use them in your NLP app as a starting point. You no longer need to train your own model, which takes a lot of time and a lot of data, and uses a lot of computer resources.
If you're new to models, it's helpful to understand the model naming convention as already the name gives you a hint about what the model can do. The first part of the name is the user that trained and uploaded the model. Then, there is the architecture that the model uses and the size of the model. The last part of the name is the dataset used to train the model.
Here's a typical model name:
deepset is the user who trained and uploaded the model,
roberta is the model architecture,
base is the size, and
squad2 is the dataset on which the model was trained.
You don't need to have in-depth knowledge of the architecture to be able to use a model but it's helpful to have a rough overview of the most common model types out there and their key differences. This will help you to better understand which models to try out in your own pipeline. Here's a timeline showing when the most common model architectures were developed.
Large models have better accuracy but it comes at a cost of speed and the resources it needs to run. Large models can be significantly slower than smaller ones and they use up a lot of your graphic card memory or even exceed its limit. Distilled models are a good compromise, as they preserve accuracy similar to the larger models but are much smaller and easier to deploy. To learn more about it, see Model Distillation.
There are standard datasets used when training a model for a particular task. For example, SQuAD stands for Stanford Question Answering Dataset and is the standard dataset used for training English-language models for question answering.
When searching for a model for your task, you can check what dataset is typically used for it and then search for a model that was trained on this dataset.
Updated 25 days ago