Large Language Models (LLMs) are notoriously computation heavy, and consequently, power hungry. Apart from the data center discussion and limitations, these realities imply that using frontier AI models in private, bounded environments and on edge devices is out of the question for the current state of the art in the industry: on-device LLM usage is a sub-optimal experience from an end user perspective.
Compressed models have been around for a while, using techniques like pruning, quantization and distillation to preserve the essence of a model, while working within the constraints of edge devices.
Various companies, for example, Multiverse Computing in the EU, are exploring this direction, for both industrial and consumer use. The majority of research out of China, where GPU constraints are evident, is focused on getting more from less.
Refiant is building foundational AI architectures, including compression and context management systems, as well as, more broadly, physics based non-convex optimisation techniques and applications with to machine learning.
In particular, we have validated results on novel-transformer methods which reduce the complexity cost of attention mechanism calculations from quadratic to log-linear as well as compression techniques which preserve model fidelity to around 95%-99% across a number of standard AI testing benchmarks. These included MMLU, AIME, and MMMLU benchmarks.
In order to showcase the technology, the open source model from Open AI: GPT-OSS-120B which uses a mixture of experts architecture (MOE) was used as a test case. The weights are usually stored in around 60GB at MXFP4 and inference requires at least 80GB of memory. In terms of our results, this was reduced to ~12GB in stored weights and ~12GB in RAM at runtime, respectively.
This demonstration is part of a broader body of work on machine learning methods connecting physics, information theory and language. The core premise is that modern machine learning is highly redundant and compressions are both possible, and optimal. The above results were validated independently by an expert in the field, and formed part of validation for Refiant's seed round of investment.
Email: team@refiant.ai