OpenAI was the first company to make a major investment in LLMs. When they launched GPT-4 in early 2023, no other company even had a model comparable to GPT-3.5.
Now, a year and a half later, several companies have models on par with GPT-4. Given OpenAI’s head start, I believe they still have an edge. It’s always been clear that they were working on something behind the scenes, but will their next generation of models truly disrupt the current architecture? Will they have a ‘secret ingredient’?
With the recent unveiling of the Strawberry model, we can take a step back and assess the situation.
The model itself isn’t revolutionary in terms of changing the basic structure of transformers. It still predicts the next token, much like previous models. The key difference seems to be that it has been trained to perform CoT reasoning with longer inference and includes a refined evaluation function that helps determine when to stop.
An important point is that, when generating 10,000 samples, the o1 model achieves high success rates in tough challenges, showing that o1 is excellent at producing synthetic data. Since OpenAI has likely been using this model for at least a year, it’s safe to assume GPT-5 is well underway, benefiting from o1’s synthetic data for its training. A reasoning-enhanced GPT-5 with strawberry could then generate even more synthetic data for GPT-6, continuing this cycle.
However, generating synthetic data is a relatively straightforward approach. Google DeepMind, for instance, has already demonstrated highly promising results in mathematics and geometry through reinforcement learning, indicating that they, too, have found an effective way to produce high-quality synthetic data.
It’s worth noting that DeepMind entered the competition at least a year behind. Yet, that gap seems to be closing quickly.
Without a clear technological edge, the race comes down to computing power. Everyone knows Google has deeper resources, so it might just be a matter of time before they catch up—or even surpass—OpenAI with the most advanced model on the market.
While o1 is a significant development and shows continued progress in AI, the fact that there’s no ‘secret’ in its architecture may weaken OpenAI’s lead.
Or is there something crucial I’m overlooking here?