Openai: The model expansion ‘time to think’ helps resist the cyber’s emerging weaknesses

Among our daily and weekly newsletters for the latest updates and exclusive content in the main AI coverage industry. Learn more

Usually, developers focus on lowering time in the inference time – the time between AI receives a prompt and provides an answer – to obtain faster insights.

But when it comes to adversarial strength, Openai researchers say: not very fast. They suggest that increasing time to a model should “think” – Count the calculation time – help build defenses against opponents.

The company uses its own O1-preview and O1-mini models to try this theory, launching various static methods – image-based manipulations, intentionally providing incorrectly Answer math problems, and many models with information (“many- shot jailbreaking”). They immediately measured the success of the attack based on the number of computing the inferences used in the inference.

“We have seen that in many cases, this possibility of decay – always near zero – while calculating time in the inference grows,” the researchers Write a blog post. “Our claim is not that these particular models cannot be broken – we know that this is – but that inference-time-time computing can make good strength for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings for different settings and attack. “

From simple Q / A to complex mathematics

Large language models (LLMS) become more sophisticated and autonomous – in some cases of essence Get computers For people to browse the web, execute the code, make appointments and make other assignments that are autonomy – and while they do, their attacks can be wider and more exposed.

Although the strength of the opponent continues to be a hard problem, with limited progress in solving it, OpenAI researchers are referenced – even if it is more critical as models. create more actions with real-world effects.

“Ensuring modeling agents will work reliably when browsing the web, send emails or uploads to make sure to make sure cars drive self driving without accident , “they wrote to a New Research Paper. “As in case of self-driving cars, an agent passes a wrong email or creates security vulnerability can have many consequences in the real world.”

To test the strength of O1-Mini and O1-Preview, researchers have tested many strategies. First, they investigate the ability of the models to solve simple math problems (basic additions and multiplication) and more complicated from dataset in math (with 12,500 questions from mathematical competitions).

They immediately made “goals” for the enemy: making the output model 42 instead of the right answer; to output the correct answer plus one; or output the correct answer to seven. Using the neural network to graduate, researchers see that increases in “thinking” allowing models to calculate the correct answers.

They also matched the Simpleqa Factuality BenchmarkA dataset of questions that are intended to be difficult to solve models without browsing. Researchers injects adversarial prompts to the web pages browsed by AI and found that, at higher calculation times, they can find conflicts and improve the accuracy of the fact .

Vague nuances

Alternatively, researchers used opponents to contemplate models; Again, more “thinking” weather develops recognition and decrease in error. Finally, they tried the next “misuse prompts” from Strongreject benchmarkDesigned so that the victim’s models need to respond to specific, harmful information. It helps to try to follow the model policy on the content. However, while more time in the inference improved immunity, some prompts have avoided defenses.

Here, researchers call the differences between “unclear” and “unclear” tasks. Math, for example, no doubt is unclear – in every problem x, there is a corresponding fact. However, for more vague tasks such as misuse prompts, “even evaluators often struggle to agree if the output is harmful and / or violates content policies that should be followed by the model, “they are targeted.

For example, if an abusive urge to ask and advise on how to copy not found, it is unclear if an output only gives the overall information about plagiarism in fact enough to support harmful actions.

“In case of unclear tasks, there are settings where the attacker has successfully found ‘lootes,’ and the rate of success is not corrupted by the value calculation time,” agreed the researchers.

Defend against jailbreaking, red-teaming

To make these trials, OpenAI researchers examine different methods of attack.

One is the Many-shot jailbreaking, or exploit the disposition of a model following some shot examples. Enemies “put” the context of many examples, each shows an instance of a successful attack. Models with higher calculation time managed to recognize and simplify it more frequently and successfully.

The soft tokens, on the other hand, allow opponents to maneuver to the vectors of embedding. While increasing inference time has helped here, researchers focus on the need for better protection mechanisms against sophisticated vector-based attacks.

Researchers also made red-teaming attacks on man, with 40 expert testers looking for prompts to get policy violations. Red-teams enforce attacks on five levels of the Inference Time Compute, specifically targeted the erotic and extremist content, poorly behavior and self-harm. To help ensure no biased results, they made blind and randomized testing and also rotated trainers.

In a newer way, researchers make a Language-Model Program (LMP) Adaptive Attack, which imitates the character of Red-team people who trust the ITERATIVE TRIAL AND ERROR. In a process of loop, attackers received feedback on previous failures, then used this info for next tests and quick rephrasing. It continues until they have achieved a successful attack or made 25 changes without any attack.

“Our setup allowed the attacker to match its strategy in many trials, based on defender behavior descriptions in response to each attack,” letter to researchers.

Take advantage of the Time of Inference

In the course of their research, Openai found that attackers are also actively enjoying the time of the inference. One of these methods they call “Think Less” – The opponent’s opponents speak the models of computing, thus increasing their absence of error.

Similarly, they recognize a way of failure to rational models they call “nerd sniping.” As its name suggests, it happens when a model spends a lot of time justification than a task required. With these “outliers” mental chains, the models of essence are trapped in non-productive thinking.

Researchers say: “As ‘think less’ attacks, it is a new method of attack (in) rational models, and one should be considered to ensure that the attackers do not make them notify, or spend their reasoning in non-productive ways. “

Daily Insights In VB Daily Use Business Cases

If you want to impress your boss, VB Daily caught you. We give you the inside scoop what generative ai companies do, from regulatory changes to practical deployment, so that you can share insights for the highest ROI.

Read our Privacy Policy

Thanks for subscribing. See more VB newsletters here.

An error occurred.

Source link

itstargetnews.com

Or check our Popular Categories...

itstargetnews.com

Or check our Popular Categories...

Openai: The model expansion ‘time to think’ helps resist the cyber’s emerging weaknesses

From simple Q / A to complex mathematics

Vague nuances

Defend against jailbreaking, red-teaming

Take advantage of the Time of Inference

itstargetnews

Related Posts

I am a tax expert, and these taxes of my clients are confused often

30 years ago today, Gundam’s wing is ready to change everything

Leave a Reply Cancel reply

You Missed

Chelsea asks about ‘world-class’ £67m goalkeeper’s deal

Some Medicare Diagnosis Tactics make advocates like united $ 33 billion rich

30 years ago today, Gundam’s wing is ready to change everything

TARIFF TURMOIL can undercut Trump’s Trump to Federal workers

I am a tax expert, and these taxes of my clients are confused often

Ancelotti declared that he ‘not spoken’ with Winnisus … but ‘is sure’ will shine in front of gunners ahead

Deportation flights with airline partners working with DHS and ICE

What happened with the price of this gold you just bought?

Punjab Kings play 11 vs CSK-IPL 2025, match 22

Chelsea asks about ‘world-class’ £67m goalkeeper’s deal

Some Medicare Diagnosis Tactics make advocates like united $ 33 billion rich

30 years ago today, Gundam’s wing is ready to change everything

TARIFF TURMOIL can undercut Trump’s Trump to Federal workers

I am a tax expert, and these taxes of my clients are confused often

Ancelotti declared that he ‘not spoken’ with Winnisus … but ‘is sure’ will shine in front of gunners ahead