Looking to save huge $$$ when hosting your generative AI models?

29 July 2025

From the perspective of someone that designs the server infrastructure these things will need to be run at scale on cost efficiently (I love this).

Some model providers, like Liquid Foundation are niching down their models. Not to be at the cutting edge of AGI but instead to focus on being way more compute resource optimized while being good enough intelligence wise.

So what if the latest OpenAI or Gemini models can start doing advanced mathematics or have advanced reasoning? If the use case is to use an LLM to classify if an incoming message is spam or not then I am using something like LFM so I don’t have to throw massive amounts of GPUs at it.

I would like to thank Marius le Roux for bringing this to my attention.

Looking to save huge $$$ when hosting your generative AI models?

Want more quick tips every weekday?