No retraining needed: Sakana’s new AI model changes how machines learn


Join our daily and weekly newsletters for the latest updates and exclusive content on industry leading AI coverage. Learn more


The researchers of Section AIan AI research laboratory that focuses on nature-inspired algorithms, has developed a self-adaptive language model that can learn new tasks without requiring fine-tuning. Called Transformer² (Transformer-squared), the model uses mathematical tricks to adapt its weights to user requests during inference.

This is the latest in a series of techniques aimed at improving the abilities of major language models (LLMs) during inference, which makes them more useful for everyday applications in different domains.

Dynamic adjustment of weights

Often, configuring LLMs for new tasks requires an expensive refining processwhere the model is exposed to new examples and its parameters are adjusted. A more effective method is “low ranking adaptation” (LoRA), where a small subset of the model parameters relevant to the target task is identified and modified during repair.

After training and fine-tuning, the model parameters remain frozen, and the only way to reuse them for new tasks is through techniques such as learning few shots and many shots.

In contrast to classic fine-tuning, Transformer-squared uses a two-step approach to dynamically adjust its parameters during inference. First, it analyzes the incoming request to understand the task and its requirements, then it applies task-specific adjustments to the model weights to optimize its performance for the specific request.

“By selectively adjusting critical components of model weights, our framework allows LLMs to dynamically adapt to new tasks in real time,” the researchers wrote in a blog post published on the company’s website.

How Sakana’s Transformer-squared works

The core ability of Transformer-squared is to dynamically adjust the critical components of its inference weights.

To do this, it is first necessary to identify the key components that can be tweaked during inference. Transformer-squared does this by singular-value decomposition (SVD), a linear algebra trick that decomposes a matrix into three other matrices revealing its internal structure and geometry. SVD is often used to compress data or to simplify machine learning models.

When applied to the weight matrix of LLM, SVD obtains a set of components that roughly represent the different abilities of the model, such as mathematics, language comprehension or coding. In their experiments, the researchers found that these components can be changed to change the model’s abilities in specific tasks.

To systematically apply these findings, they developed a process called singular value finetuning (SVF). During training, the SVF learns a set of vectors from the SVD model components. These vectors, called z-vectors, are compact representations of individual skills and can be used as knobs to increase or decrease the model’s ability in specific tasks.

At inference time, Transformer-squared uses a two-pass mechanism to adapt the LLM for unseen tasks. First, it examines the motivation to determine the skills needed to solve the problem (researchers have proposed three different techniques to determine the necessary skills). In the second stage, Transformer-squared configures the z-vectors corresponding to the request and runs the prompt through the model and the updated weights. This allows the model to provide a tailored response to each prompt.

Transformer-squared training and inference (source: arXiv)

Transformer-square in action

Researchers applied Transformer-squared to Llama-3 and Mistral LLMs and compare it to LoRA on a variety of tasks, including math, coding, reasoning and visual question answering. Transformer-squared outperforms LoRA in all benchmarks while having fewer parameters. It is also noted that, unlike Transformer-squared, LoRA models cannot adjust their weights during inference, which makes them less flexible.

Another interesting finding is that knowledge gained from one model can be transferred to another. For example, z-vectors obtained from Llama models can be used in Mistral models. The results are not the same as generating z-vectors from scratch for the target model, and the transfer is possible because the two models have similar architectures. But it suggests the possibility of learning general z-vectors that can be used in a wide range of models.

Transformer-squared (SVF in table) vs base models and LoRA (source: arXiv)

“The way forward lies in building models that dynamically adapt and cooperate with other systems, combining specialized capabilities to solve complex, multi-domain problems,” the researchers wrote. “Self-adaptive systems like Transformer² bridge the gap between static AI and living intelligence, paving the way for efficient, personalized and fully integrated AI tools that drive the advancement of industries and in our daily lives.”

Sakana AI releases code for training Transformer-squared on components GitHub.

Tricks during inference

As businesses explore various LLM applications, the past year has seen a noticeable shift towards the development of interview time techniques. Transformer-squared is one of many methods that enable developers to adapt LLMs for new tasks during inference without having to retrain or refine them.

Titansan architecture created by Google researchers, solves the problem from a different angle, giving language models the ability to learn and memorize new information during inference. Other techniques focus on enabling the borders of the LLM to use theirs increasing context windows to learn new tasks without training.

With businesses owning the data and knowledge specific to their applications, advances in inference-time customization methods can make LLMs even more useful.



Source link
  • Related Posts

    Why is everyone in AI off the Dreeseek

    Join our daily and weekly newsletters for newest updates and exclusive content to cover the industry. Learn more As a few days ago, the most displeased nerds (I said it…

    Everyone announced on Xbox Developer Direct Showcase

    Xbox hosts it Directer Direct Show now, detail the development of three games we know and a perfect new title, Ninja Gaiden 4. If you can’t tune, here’s what you…

    Leave a Reply

    Your email address will not be published. Required fields are marked *