Small Language Models: Same performance but cheaper?

In this video we explore evidence that Small Language Models can achieve comparable performance to Large Language Models under certain conditions.

Today, I want to talk about an emerging theme in the language space: the fact that small language models can achieve similar performance to large language models on specific tasks, which has consequences for the industry and is good news for everybody.

Large vs. Small Language Models

A large language model typically has hundreds of billions of parameters, while a small model would have a few million to a few billion parameters.

Challenges with Large Language Models

Large language models are powerful but expensive to run, require a lot of GPU and memory, have high latency, and often need to be fine-tuned for specific tasks, making them prohibitive to run in many cases.

Achievements with Smaller Models

Two papers demonstrate the potential of smaller models. The first paper from Microsoft Research, "Textbooks Are All You Need," trained and fine-tuned a 1.3 billion parameter model on a high-quality dataset of coding exercises, achieving performance comparable to larger models. The second paper, "Tiny Stories," used GPT 3.5 and GPT 4 to generate short stories for toddlers and then trained smaller models with just a few million parameters to do text completion, also achieving impressive results.

Implications and Future Prospects

The use of smaller models for specific tasks could lead to cheaper, faster, and more accessible language models, potentially running on devices like laptops, phones, or watches. This approach could also involve a "mixture of experts," where a variety of smaller models are orchestrated by a larger model to handle specific domains.

Importance of High-Quality Data

Both papers emphasize the need for high-quality training data to achieve good output from smaller models, whether it's created manually or using synthetic data.

In conclusion, the potential of smaller language models for specific tasks presents an exciting prospect for the future of language models, offering cost-effective and accessible solutions for various applications.