Blog

SLM models at the edge

Written by Nathan Wilkerson, VP of Engineering | Nov 17, 2025 10:22:02 PM

LLMs are great. They can solve problems and automate tasks. The hidden danger behind LLMs is cost. LLMs are charged based on usage metrics, often called Tokens. SaaS products with AI built in may give you some for tokens for free but then you hit a limit.  But when your budget of tokens is out that doesn’t mean the work is done. How do you combat this? 

Let’s look at more traditional Machine learning. When the first IoT security cameras came out, frames were shipped back to a SaaS provider for analysis. This allowed central control of models and accommodated the limited hardware that was available in cameras at the time. What happened was that over 4-5 years there was a shift to move Machine Learning to the edge and away from expensive data centers. This was driven by an increase in processing power, optimizing models for their specific task, and SaaS providers trying to reduce cost. 

It's been 3 years since ChatGPT 3 came out. Models have become larger and more powerful. At the same time the smaller models, SLM, are becoming more powerful and more importantly they are able to be distilled into better handling specialized workloads. With OpenSource models companies can host large models and spiky workloads where they belong, in the cloud. The elasticity can’t be beat. 

With distillation you can use a larger model, a teacher, to refine a SLM, a student. This is done for a fraction of the time and money that fine tuning or training a model takes. Once it’s done you have a SLM highly specialized model. These can be SLM enough to run on a laptop. Currently I am running gpt-oss:20b on my mac book. It’s fast, responsive and keeps all of my information private. 

But larger companies are starting the shift as well. Pixel 10 was released in August and features like Magic Cue, 5 Voice Translate, 6 Call Notes with actions 7 and Personal Journal 8 were all moved to the device. This is made possible because of the increases in processing power that is available. 

This doesn’t mean that Cloud LLMs are dead. In fact its the opposite. As the need for more specialized models increases. The need for larger more complex models and the hardware to run them will also grow. Eventually you can have a distilled model running to help with every day tasks at home and work. 

This push to the edge won’t happen overnight. However, this is a trend that will pick up speed in 2026.