• Core Concepts AI
  • Posts
  • Less Is More Than Enough: The Small Language Model Renaissance

Less Is More Than Enough: The Small Language Model Renaissance

"I don't know," might be the most important phrase an AI can learn to say.

THE FIRST TIME I SAW a small language model in action, I thought someone was playing a trick.

There it was, running on a standard laptop. No cloud connection. No carnival of servers humming in the background.

Instead, this small little guy just responded to queries with the kind of intelligence we've been told (over and over again) requires billions of parameters and computational resources.

"That's impossible," I thought.

But there it was, proving me wrong with every response.

Parameter Bloat

See, we've become conditioned to believe the narrative of AI excess. That more is always better, that intelligence scales with size, that the path forward is paved with ever-larger parameter counts and training datasets that devour the collective written works of humanity, spitting out the shells of Shakespeare and Reddit like sunflower seeds.

Because the Large Language Models we are most familiar with carry so much: billions of parameters, petabytes of training data—the computational equivalent of small nations' power grids.

LLMs carry our questions, our demands, our impatience for answers. (Not to mention the weight of tech companies' obscene market valuations.)

But Small Language Models (SLMs) are more focused on quality than quantity. SLMs are trained for efficiency. Precision. Focus.

And as these models innovate, you have to wonder: Have we been thinking about artificial intelligence all wrong?

Chasing Intelligence

A credit union exec I spoke with recently said her company needed an AI solution for customer service. And so the consultants arrived with presentations about processing power and GPU clusters, waxing not-so-poetically about models so large they seemed to have their own gravitational pull.

Big! Bigger! Biggest!

"But we just needed something that could answer questions about thins like mortgage applications," she told me later. “I don’t need to build a rocket ship when I just need to cross the street, right?"

Enter a specialized SLM. Trained specifically on their mortgage documentation, banking regulations, and customer interaction history, the compact model outperformed its larger cousins on the specific task at hand.

OK, so it didn't know about 19th century poetry or how to code in Python. But those limitations are precisely the point.

These limitations allowed the model to be deployed  across their branches, running locally on standard hardware.

No data leaving their systems. No ongoing API costs. No latency issues.

Just answers…immediately available.

The Elsewhere of Big Models

Tech giants continue their arms race of parameter counts and training corpus sizes.

But something different is happening in the trenches of practical AI implementation.

Look at Apple, who (quietly) released OpenELM, designed to run directly on iPhones. There are healthcare providers who are implementing domain-specific models that understand medical terminology without sending sensitive patient data to third-party servers.

E-com companies are generating product descriptions with models so streamlined they can run on the same servers that host their websites.

These models carry less capabilities, but what they carry matters more.

For organizations looking to implement SLMs, several production-ready options exist beyond the research lab.

The open-source community in particular seems to have embraced these smaller models. Hugging Face's repository hosts hundreds of fine-tuned SLMs for specific domains, from legal document analysis to medical terminology understanding. And many can be deployed with minimal technical overhead.

Or look to Microsoft, whose researchers demonstrated this potential with their Phi-2 model. At 2.7 billion parameters, it is a fraction of the size of the industry giants.

They fed it "textbook-quality" data instead of scraping the entire internet.

The result?

On certain reasoning and language tasks, this modest model outperformed systems many times its size.

As the saying goes: You are what you eat.

Living with Ambiguity

Of course, there are trade-offs. The smaller models don't have the breadth of knowledge or the generalized reasoning abilities of their larger counterparts. They can't engage in open-ended philosophical discussions or, say, create art from text prompts. Sometimes they reach the limits of their knowledge and simply stop.

But there’s something refreshingly…powerful about that. Unlike their larger cousins that confabulate with confidence when uncertain, these smaller models often simply acknowledge their limitations.

"I don't know," might be the most important phrase an AI can learn to say.

Comparing performance metrics shows that the gap manifests in quantifiable ways.

In benchmarks measuring general knowledge and reasoning (like MMLU), top LLMs achieve scores of 85-90%, while leading SLMs typically range from 60-75%.

However, the performance calculus shifts dramatically when measuring practical metrics.

For example, SLMs typically deliver responses in 15-50 milliseconds compared to 500-2000 milliseconds for cloudbased LLMs.

The cost difference is even more stark. Running an SLM locally might cost a few dollars in electricity monthly, while LLM API calls for a medium-sized application can easily exceed $10,000 monthly at standard rates.

Fine-tuning narrows performance gaps considerably. The process involves taking a pre-trained small model and further training it on domain-specific data.

For businesses implementing these technologies, this requires a certain comfort with ambiguity. Understanding exactly what your AI needs to know and accepting what it doesn't.

It means choosing the right tool rather than the most impressive one.

Ah, the novelty!

Local Intelligence

It’s interesting to note what these smaller models ditch. Not just computational requirements, but certain assumptions about artificial intelligence itself.

For instance, the idea that AI must be centralized in the cloud. Or that our data must always travel to distant servers to be processed.

In other words, SLMs abandon the notion that a single model must serve all purposes, understand all domains, speak all languages.

These small models, in their silent efficiency, suggest a different future…a possible one, at least. A future in which AI is distributed, specialized, and personal.

It may not seem like that big of deal, but in reality it’s a paradigm shift within the larger paradigm that already shifted!

Because what if this “intelligence” could live on our devices rather than being rented by the token from tech giants?

And what if privacy didn’t need to be sacrificed on the altar of capability?

Moving Forward

The next time someone tells you about the latest breakthrough in large language models, how many parameters it has or how many GPUs it requires, ask them a different question: "What could be accomplished if we went smaller instead?"

OK, maybe don’t ask that. You’ll no doubt get an eyeroll or two. Or a hundred.

But the research community continues pushing the boundaries of what small models can do, and so-called knowledge distillation has emerged as a particularly promising approach: training a compact model (the "student") to mimic the behavior of a larger model (the "teacher"). The process works by having the larger model generate predictions on a dataset, then training the smaller model to match those predictions rather than the original labels.

In practice, this is similar to having an expert mentor guide an apprentice. Studies demonstrate that a 1.3 billion parameter model trained via distillation can achieve up to 90% of the performance of its 175 billion parameter teacher on common tasks. Pretty wild, right?

Companies like Cohere have operationalized this approach, offering distilled models that deliver impressive performance with significantly reduced computational demands.

For organizations looking to implement AI solutions today, this offers a pragmatic path forward. Instead of chasing the ever-receding horizon of ever-larger, insatiable models, what if we consider whether a focused SLM might deliver exactly what we need more efficiently, more privately, and more cost-effectively.

To innovate ethically and effectively, it may require us to build smarter, more focused, and more distributed.

“Bring what you need and leaving the rest behind.”

Because sometimes what you leave behind makes all the difference.

Contact us at NorthLightAI.com to learn how we can help you build a stronger admissions and recruitment infrastructure using AI.