Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1

Join our daily and weekly newsletters for the latest updates and exclusive content on industry leading AI coverage. Learn more

Getting started with Chinese AI DeepSeekknown for challenging the main AI vendors with open-source technologies, just dropped another bombshell: a new open reasoning LLM called DeepSeek-R1.

Based on the recently introduced DeepSeek V3 mixture-of-experts model, DeepSeek-R1 matches the performance of o1, OpenAI’s frontier reasoning LLM, on math, coding and reasoning tasks. The best part? It is made at a very tempting cost, which proves to be 90-95% cheaper than the latter.

The release marks a major leap forward in the open-source arena. It shows that open models are increasingly closing the gap with closed commercial models in the artificial general intelligence (AGI) race. To demonstrate the agility of its work, DeepSeek also used the R1 to refine six Llama and Qwen models, bringing their performance to a new level. In one case, the distilled version of the Qwen-1.5B outperformed the larger models, GPT-4o and Claude 3.5 Sonnet, in selected math benchmarks.

These distilled models, along with the main R1is open-sourced and available at Face Hugs under the MIT license.

What does DeepSeek-R1 bring to the table?

The focus is on sharpening artificial general intelligence (AGI), a level of AI that can perform intellectual tasks like humans. Many teams are doubling down on improving the reasoning capabilities of the models. OpenAI made the first notable move in the domain with this o1 modelwhich uses a chain of thought reasoning process to solve a problem. Through RL (reinforcement learning, or reward-driven optimization), o1 learns to refine its chain of thought and refine the strategies it uses — ultimately learning to recognize and correct those it’s wrong, or trying new methods when the current ones don’t work anymore.

Today, continuing the work in this direction, DeepSeek released DeepSeek-R1, which uses a combination of RL and managed fine tuning to handle complex reasoning tasks and match the performance of o1.

When tested, the DeepSeek-R1 scored 79.8% on the AIME 2024 math tests and 97.3% on the MATH-500. It also achieved a 2,029 rating on Codeforces — better than 96.3% of human programmers. In contrast, the o1-1217 scored 79.2%, 96.4% and 96.6% respectively in these benchmarks.

It also shows strong general knowledge, with 90.8% MMLU accuracy, behind o1’s 91.8%.

Performance of DeepSeek-R1 against OpenAI o1 and o1-mini

The training pipeline

DeepSeek-R1’s reasoning performance marks a big win for the Chinese startup in the US-dominated AI space, especially since the entire work is open source, including how the company trained the whole thing. .

However, the job is not as straightforward as it sounds.

According to the paper describing the research, DeepSeek-R1 was developed as an improved version of DeepSeek-R1-Zero – a collapse model trained only from reinforcement learning.

We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive – truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.

DeepSeek-R1 not only open-sources a barrage of models but… pic.twitter.com/M7eZnEmCOY
— Jim Fan (@DrJimFan) January 20, 2025

The company first used DeepSeek-V3-base as a base model, developing reasoning capabilities without using managed data, essentially focusing only on its own evolution through a pure testing process-and -error based on RL. Developed intrinsically from the work, this ability ensures that the model can solve more complex reasoning tasks by using extended computation test time to explore and refine thought processes in greater depth.

“During training, DeepSeek-R1-Zero naturally emerged with very strong and interesting behavioral reasoning,” the researchers said in the paper. “After thousands of RL steps, DeepSeek-R1-Zero showed the best performance in reasoning benchmarks. For example, the pass@1 score of AIME 2024 increased from 15.6% to 71.0% , and with majority voting, the score further increases to 86.7%, which is equivalent to the performance of OpenAI-o1-0912.

However, despite showing improved performance, including behaviors such as reflection and exploring alternatives, the first model showed some problems, including poor reading and speech integration. To fix this, the company builds on the work done for R1-Zero, using a multi-stage approach that combines both supervised learning and reinforcement learning, and thus creates the improved model of R1.

“In particular, we will start by collecting thousands of cold start data to fine-tune the DeepSeek-V3-Base model,” the researchers explained. “After this, we make RL based on reasoning like DeepSeek-R1-Zero. With the near convergence of the RL process, we make new SFT data by rejecting the sampling of the RL checkpoint, combined with the supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then training the DeepSeek-V3 -Base model.After refining the new data, the checkpoint goes through an additional RL process, taking into account the prompts from all scenarios. After these steps, we get a checkpoint called DeepSeek-R1, which achieves performance similar to OpenAI-o1-1217.

Cheaper than o1

In addition to improved performance that almost matches OpenAI’s o1 in benchmarks, the new DeepSeek-R1 is also very affordable. Specifically, where OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, cost $0.55 per million input and $2.19 per million output tokens.

Sooo @deepseek_ai's reasoner model, which sits somewhere between o1-mini & o1 is about 90-95% cheaper https://t.co/ohnI6dtPRC pic.twitter.com/Qn78yIGUtt
— Emad (@EMostaque) January 20, 2025

The model can be tested as “DeepThink” on DeepSeek chat platformwhich is similar to ChatGPT. Interested users can access the model weights and code repository through Hugging Face, under the MIT license, or can go with the API for direct integration.

Daily insight into business use cases at VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory changes to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. See more VB newsletters here.

An error occurred.

Source link

itstargetnews.com

Or check our Popular Categories...

itstargetnews.com

Or check our Popular Categories...

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% lower cost

What does DeepSeek-R1 bring to the table?

The training pipeline

Cheaper than o1

itstargetnews

Related Posts

Don’t get enough day? Try these 11 foods full of vitamin D

Spacex Preps first use Super Heavy Booster for next star launch

Leave a Reply Cancel reply

You Missed

Barcelona vs Real Betis Prediction and Gambling Tips

Hedge funds hit with primary margin calls since 2020 Covid crisis

Spacex Preps first use Super Heavy Booster for next star launch

Trump Taff Tailfin is getting worse, the NASDAQ ends in the bear market

Don’t get enough day? Try these 11 foods full of vitamin D

Men City Premier League is accused of illegally favoring Arsenal

Conan O’Brien

Trump wants NATO to spend 5% on defense. What do party leaders say in Canada? – National

China imposes a 34% rate on imports of all north -Americans from April 10

Barcelona vs Real Betis Prediction and Gambling Tips

Hedge funds hit with primary margin calls since 2020 Covid crisis

Spacex Preps first use Super Heavy Booster for next star launch

Trump Taff Tailfin is getting worse, the NASDAQ ends in the bear market

Don’t get enough day? Try these 11 foods full of vitamin D

Men City Premier League is accused of illegally favoring Arsenal