No retraining needed: Sakana’s new AI model changes how machines learn

Join our daily and weekly newsletters for the latest updates and exclusive content on industry leading AI coverage. Learn more

The researchers of Section AIan AI research laboratory that focuses on nature-inspired algorithms, has developed a self-adaptive language model that can learn new tasks without requiring fine-tuning. Called Transformer² (Transformer-squared), the model uses mathematical tricks to adapt its weights to user requests during inference.

This is the latest in a series of techniques aimed at improving the abilities of major language models (LLMs) during inference, which makes them more useful for everyday applications in different domains.

Dynamic adjustment of weights

Often, configuring LLMs for new tasks requires an expensive refining processwhere the model is exposed to new examples and its parameters are adjusted. A more effective method is “low ranking adaptation” (LoRA), where a small subset of the model parameters relevant to the target task is identified and modified during repair.

After training and fine-tuning, the model parameters remain frozen, and the only way to reuse them for new tasks is through techniques such as learning few shots and many shots.

In contrast to classic fine-tuning, Transformer-squared uses a two-step approach to dynamically adjust its parameters during inference. First, it analyzes the incoming request to understand the task and its requirements, then it applies task-specific adjustments to the model weights to optimize its performance for the specific request.

“By selectively adjusting critical components of model weights, our framework allows LLMs to dynamically adapt to new tasks in real time,” the researchers wrote in a blog post published on the company’s website.

How Sakana’s Transformer-squared works

The core ability of Transformer-squared is to dynamically adjust the critical components of its inference weights.

To do this, it is first necessary to identify the key components that can be tweaked during inference. Transformer-squared does this by singular-value decomposition (SVD), a linear algebra trick that decomposes a matrix into three other matrices revealing its internal structure and geometry. SVD is often used to compress data or to simplify machine learning models.

When applied to the weight matrix of LLM, SVD obtains a set of components that roughly represent the different abilities of the model, such as mathematics, language comprehension or coding. In their experiments, the researchers found that these components can be changed to change the model’s abilities in specific tasks.

To systematically apply these findings, they developed a process called singular value finetuning (SVF). During training, the SVF learns a set of vectors from the SVD model components. These vectors, called z-vectors, are compact representations of individual skills and can be used as knobs to increase or decrease the model’s ability in specific tasks.

At inference time, Transformer-squared uses a two-pass mechanism to adapt the LLM for unseen tasks. First, it examines the motivation to determine the skills needed to solve the problem (researchers have proposed three different techniques to determine the necessary skills). In the second stage, Transformer-squared configures the z-vectors corresponding to the request and runs the prompt through the model and the updated weights. This allows the model to provide a tailored response to each prompt.

*Transformer-squared training and inference (source: arXiv)*

Transformer-square in action

Researchers applied Transformer-squared to Llama-3 and Mistral LLMs and compare it to LoRA on a variety of tasks, including math, coding, reasoning and visual question answering. Transformer-squared outperforms LoRA in all benchmarks while having fewer parameters. It is also noted that, unlike Transformer-squared, LoRA models cannot adjust their weights during inference, which makes them less flexible.

Another interesting finding is that knowledge gained from one model can be transferred to another. For example, z-vectors obtained from Llama models can be used in Mistral models. The results are not the same as generating z-vectors from scratch for the target model, and the transfer is possible because the two models have similar architectures. But it suggests the possibility of learning general z-vectors that can be used in a wide range of models.

*Transformer-squared (SVF in table) vs base models and LoRA (source: arXiv)*

“The way forward lies in building models that dynamically adapt and cooperate with other systems, combining specialized capabilities to solve complex, multi-domain problems,” the researchers wrote. “Self-adaptive systems like Transformer² bridge the gap between static AI and living intelligence, paving the way for efficient, personalized and fully integrated AI tools that drive the advancement of industries and in our daily lives.”

Sakana AI releases code for training Transformer-squared on components GitHub.

Tricks during inference

As businesses explore various LLM applications, the past year has seen a noticeable shift towards the development of interview time techniques. Transformer-squared is one of many methods that enable developers to adapt LLMs for new tasks during inference without having to retrain or refine them.

Titansan architecture created by Google researchers, solves the problem from a different angle, giving language models the ability to learn and memorize new information during inference. Other techniques focus on enabling the borders of the LLM to use theirs increasing context windows to learn new tasks without training.

With businesses owning the data and knowledge specific to their applications, advances in inference-time customization methods can make LLMs even more useful.

Daily insights into business use cases in VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory changes to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. See more VB newsletters here.

An error occurred.

Source link

itstargetnews.com

Or check our Popular Categories...

itstargetnews.com

Or check our Popular Categories...

No retraining needed: Sakana’s new AI model changes how machines learn

Dynamic adjustment of weights

How Sakana’s Transformer-squared works

Transformer-square in action

Tricks during inference

itstargetnews

Related Posts

Why is everyone in AI off the Dreeseek

Everyone announced on Xbox Developer Direct Showcase

Leave a Reply Cancel reply

You Missed

France, Indonesia to sign agreement to transfer French to death row

Gen Americans left their European cousins in the dust

‘See you in…’: Shark Aman Gupta’s warning to startup founders introduces Smart Ring for payments Shark Tank India S4

Sanju Samson’s father reveals Rahul Dravid’s ‘they are all jealous’ comments about his son

BBC finds most civilians killed in IDF attack on Lebanese village

Oil prices poised for weekly fall on Trump’s energy policies By Reuters

P180 Buy Vince Holdings Control, putting Brendan Hoffman again in the CEO seat

SA20 2025 (watch): Heinrich Klaasen

Why is everyone in AI off the Dreeseek

itstargetnews.com

Or check our Popular Categories...

itstargetnews.com

Or check our Popular Categories...

No retraining needed: Sakana’s new AI model changes how machines learn

Dynamic adjustment of weights

How Sakana’s Transformer-squared works

Transformer-square in action

Tricks during inference

itstargetnews

Related Posts

Why is everyone in AI off the Dreeseek

Everyone announced on Xbox Developer Direct Showcase

Leave a Reply Cancel reply

You Missed

France, Indonesia to sign agreement to transfer French to death row

Gen Americans left their European cousins ​​in the dust

‘See you in…’: Shark Aman Gupta’s warning to startup founders introduces Smart Ring for payments Shark Tank India S4

Sanju Samson’s father reveals Rahul Dravid’s ‘they are all jealous’ comments about his son

BBC finds most civilians killed in IDF attack on Lebanese village

Oil prices poised for weekly fall on Trump’s energy policies By Reuters

P180 Buy Vince Holdings Control, putting Brendan Hoffman again in the CEO seat

SA20 2025 (watch): Heinrich Klaasen

Why is everyone in AI off the Dreeseek

Gen Americans left their European cousins in the dust