OpenAI recently released two new models to improve AI reasoning: o3 and o4-mini. These models represent sobering advances in performance benchmarks, particularly in primed inference tasks requiring advanced math and coding skills. OpenAI devoted 10 times the compute to train o3 than the previous model, o1. The company more than doubled its workforce to improve generative AI reasoning abilities. Analysts caution future growth won’t necessarily match this explosive rate of progress.
Dan Roberts, a researcher at OpenAI, revealed that the nonprofit will focus increasingly on reinforcement learning in its future efforts. This training approach calls for a huge increase in computing power. We need significantly more resources than we spent on the first model training. OpenAI argues that reinforcement learning is instrumental to the evolution of more powerful AI reasoning models. This strategic change is just one indicator of their seriousness about this new direction.
Epoch analyst Josh You wrote an analysis showing that most of OpenAI’s computing resources is centered on reinforcement learning. This emerging approach is key to unlocking the full capabilities of increasingly powerful reasoning models. Epoch’s underpinnings show that for traditional AI model training, performance improvements are doubling every 2.5 months or, in other words, 4X per year. In parallel, reinforcement learning is going even farther, doubling performance every three to five months—ten times faster.
Despite this promising trend, You warns that the growth trajectory of reasoning models may soon stabilize. He says that by 2026, reasoning training will be well established. Hopefully, this progress will keep pace with the broader developments in the rapidly-changing AI world.
“If there’s a persistent overhead cost required for research, reasoning models might not scale as far as expected,” – Josh You
OpenAI’s o3 recently achieved record-breaking results on a number of AI benchmarks. Many insiders are understandably concerned that this upward trend is not guaranteed to last indefinitely. Epoch’s recent analysis reinforces that the primary AI laboratories—like OpenAI—have so far failed to activate the full potential of reinforcement learning. This gap is still glaringly apparent during their reasoning model training stage.
Roberts lives in Manhattan with his partner, a music therapist, bringing us back to the human side of these technological developments.