CompreSSM: AI models are compressed on the fly and trained 4 times faster

17.04.2026 0 By Chilli.Pepper

Imagine a giant machine that sheds its weight on its own, becoming smarter with every moment of learning. That’s exactly what the revolutionary CompreSSM technique does, transforming AI training from a grueling marathon into an elegant sprint. Researchers at the Massachusetts Institute of Technology (MIT) and partners have invented a way to compress models on the fly, saving up to 4x the resources without sacrificing performance. This isn’t just optimization—it’s a new era of efficient AI.

Training large AI models is like building a cathedral: a huge investment in materials, time, and energy. Traditionally, to get a compact version, you first build a monster and then ruthlessly “trim” it down—or you start small, sacrificing quality. A team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), the Max Planck Institute for Intelligent Systems, the European Learning and Intelligent Systems Laboratory, ETH, and Liquid AI have turned that approach on its head. Their technique CompreSSM compresses models during training, making them smaller and faster on the go.

This breakthrough concerns a family of architectures state space models (SSM), which already power speech processing, audio generation, and robotics. Borrowing tools from control theory, the scientists identify the “hardworking” components of a model and painlessly remove the “sluggish” ones. “This is a way to make models smaller and faster right during training,” explains Makram Shaheen, a graduate student in electrical engineering and computer science at CSAIL and lead author of the study.1.

How the magic of early compression works

The secret of CompreSSM is in the early stabilization of the role of the model components. Using the mathematical quantity singular values ​​of the Hankel matrix, the team measures the contribution of each internal state to the model's behavior. It turns out that the importance hierarchy is fixed after just 10% of the training. Then, unnecessary measurements are discarded, and the remaining 90% is run at the speed of the miniature model.

In image classification tests, the compressed models retained the accuracy of the full-size models while training 1,5 times faster. The model reduced to a quarter of its original size achieved 85,7% accuracy on CIFAR-10, compared to 81,8% for the model trained from scratch in the small format.1For the popular Mamba architecture, the acceleration reached 4 times: from 128 to 12 dimensions without loss of performance.

“We capture the complex dynamics of the acceleration phase, preserving the most useful states,” says Shahin. The theoretical basis is the application of Weyl’s theorem, which proves the smooth change in the importance of states. Experiments have confirmed the stability of the ratings, giving confidence that early rejection will not harm later.

A practical "safety cushion" is to revert to a checkpoint if compression degrades the result. This allows you to control the balance between speed and accuracy, avoiding abstract thresholds.

Why CompreSSM outperforms the competition

Existing methods — pruning or knowledge distillation — waste resources. Pruning trains the giant completely, then cuts. Distillation duplicates efforts: the big “teacher” teaches the small “student.” CompreSSM operates in real time, without unnecessary cycles.

Comparison with Hankel kernel norm regularization showed an advantage: CompreSSM is 40 times faster and more accurate. The competitor slowed down training by a factor of 16 due to the calculation of eigenvalues ​​at each step1On CIFAR-10, the highly compressed CompreSSM models held the level where distillation fell.

Extensions to linear time-varying systems, like Mamba, have already been implemented. The future is matrix-valued dynamical systems in a linear sense, bringing the foundations of modern AI giants closer to transformers.

Broader context: SSM as an alternative to transformers

State space models are gaining momentum as a more efficient alternative to transformers. Mamba, S4, and Hyena show linear complexity versus quadratic complexity in transformers, ideal for long sequences. CompreSSM amplifies this advantage by making SSMs even more compact.

According to data from ICLR 2026, where the paper was accepted, SSMs are already outperforming transformers in audio and DNA modeling tasks. Antonio Orvieto from the ELLIS Institute in Tübingen praises: “The method proves that the state size of SSMs can be reduced during training with controls.”1This paves the way for the retraining of giant SSMs.

In robotics, SSMs control dynamics where every millisecond counts. On-the-fly compression will reduce the power consumption of drones or autonomous cars. In audio, faster music generation without cloud farms.

Potential for the Ukrainian AI ecosystem

In Ukraine, where resources are limited, CompreSSM is a breath of fresh air. Local startups like Vector or Respeecher work with audio and speech. SSM compression will allow training models on domestic servers, reducing dependence on imported data centers.

According to the Ministry of Education and Science of Ukraine, AI education is growing: KPI and LNU are developing SSM for processing the Ukrainian language. CompreSSM will accelerate prototypes, making them competitive with global ones. Imagine Ukrainian-language models trained 4 times faster - this will accelerate the digitization of education and medicine.

Globally, given the energy crisis, saving computation is the key to sustainable AI. CompreSSM reduces carbon footprint: training GPT-3 is equivalent to 1200 tons of CO2, compression will halve this2.

Limitations and development horizons

The technique shines on MIMO models, where state size is strongly correlated with performance. For SISO architectures, the benefits are more modest. The theory is ideal for linear static systems, but the extension to time-shifting is a success. Beyond SSM, it is a challenge, but the path to transformers is open.

The team is planning matrix-valued systems in linear terms. “This is the first step where the theory is pure,” says Shahin.1With funding from Boeing and the US Navy, the project is gaining momentum.

Experts predict: by 2030, SSM with CompreSSM will replace transformers in 30% of tasks, according to Hugging Face estimates3This will change the landscape from chatbots to climate modeling.

Implications for the world of AI

CompreSSM makes AI more accessible: smaller models on smartphones, cheap training for startups. Democratizing technology will accelerate innovation in poor regions. In Ukraine, this is a chance to catch up with the leaders by creating national AI solutions.

This approach raises the question: will AI evolve towards self-similar structures, like the nesting dolls in the illustration? CompreSSM is not the end, but the beginning of an era where models grow smarter, discarding the superfluous, as if nature does.

Sources

  1. MIT News: New technique makes AI models leaner and faster while they're still learning
  2. Energy and Policy Considerations for Deep Learning in NLP (arXiv)
  3. Hugging Face Blog: State Space Models

Support the project:

Subscribe to news:




In topic: