So-called reasoning AI models are becoming easier – and cheaper – to develop.
On Friday, NovaSky, a group of researchers based at UC Berkeley’s Sky Computing Lab, released the Sky-T1-32B-Preview, a reasoning model that competes with a earlier version of OpenAI’s o1 on a number of key benchmarks. The Sky-T1 seems to be the first truly open source reasoning model in the sense that it can be made replicated from scratch; the team released the data set they used to train it as well as the necessary training code.
“Remarkably, the Sky-T1-32B-Preview was trained for less than $450,” the team wrote in a blog post“demonstrates that it is possible to reproduce high levels of reasoning ability cheaply and efficiently.”
Unlike most AI, reasoning models effectively check themselves, which helping them avoid some of the pitfalls that often plague models. Reasoning models take longer – usually seconds to minutes – to reach solutions compared to a typical non-reasoning model. The upside is, they tend to be more reliable in domains like physics, science, and math.
The NovaSky team says it used a different reasoning model, The QwQ-32B-Preview on Alibabato generate the initial training data for Sky-T1, then “curate” the mixed data and use OpenAI’s GPT-4o-mini to convert the data into a more usable format. Training the 32-billion-parameter Sky-T1 took about 19 hours using a rack of 8 Nvidia H100 GPUs. (The parameters roughly correspond to the problem-solving skill of the model.)
According to the NovaSky team, the Sky-T1 outperformed an early preview version of the o1 in MATH500, a collection of “competition-level” math challenges. The model also beat the o1 preview on a set of difficult problems from LiveCodeBench, a coding evaluation.
However, Sky-T1 lacks the o1 preview of GPQA-Diamond, which contains physics, biology, and chemistry-related questions that a PhD graduate is expected to know.
It is also important to note that OpenAI’s The GA release of o1 stronger model than the preview version of o1, and that OpenAI is expected to release a better performing reasoning model, o3in the coming weeks.
But the NovaSky team says that Sky-T1 only marks the beginning of their journey to develop open source models with advanced reasoning capabilities.
“Going forward, we will focus on developing more efficient models that maintain strong reasoning performance and explore advanced techniques that further improve the efficiency and accuracy of models during testing. ,” the team wrote in the post. “Stay tuned as we move forward with these exciting initiatives.”