Day 12 of Shipmas: New frontier models o3 and o3-mini announcement


I think OpenAI is pushing the AGI idea too much (it’s too early and there isn’t enough proof). After a quick push of the idea of “reasoning,” it is not even true reasoning (remember step-by-step instructions and structured outputs).


What o3 could do with images (as we saw on day 12), GPT turbo 3.5 could do with text one year ago. The math was the key to providing it with reference patterns to predict new patterns and mix them. The model will probably be very good for some tasks and very expensive…

I wonder how many prompts o3 needs before it becomes slower and gets stuck like other models.
Can it improve itself without human intervention?
Can the model invent something, or will it only perform predictive prompts and mix them? Is there a way to measure that?

  • Does the model possess self-awareness or consciousness? (Current AI lacks the subjective experience and self-awareness that are fundamental to human intelligence.)
  • How does the model handle tasks outside its training data? (AGI should be able to generalize knowledge to completely new and unforeseen tasks, something current models struggle with.)
  • What measures are in place to ensure ethical use and alignment with human values? (Ensuring that an AGI aligns with human ethics and values is crucial, yet current models rely heavily on human oversight.)
  • How transparent are the model’s decision-making processes (Understanding how AI arrives at its conclusions is essential for trust and reliability, which is not fully addressed in current models.)
  • What are the limitations in understanding context and nuance? (AGI should comprehend context and nuance at a human level, whereas current models can misinterpret subtleties in language and intent.)
  • Can the model engage in long-term planning and goal setting? (AGI would need the capability to set and pursue long-term goals autonomously, which is beyond the scope of current AI models.)
  • How scalable is the model’s architecture for achieving true general intelligence? (The underlying architecture must support the complexity and adaptability required for AGI, a challenge not yet overcome.)

All very exciting but also disappointing:

This is supposed to be the climax of “shipmas”, no?

What has shipped?

3 Likes

I don’t know much about computers, but I use ChatGPT a lot, exploring and considering multiple possibilities. I think they want to improve the AI further to solve problems. I’ve noticed that the advertisements (at least in my country, Spain) mostly promote it as an excellent tool for work—Word, Excel, mathematical tasks, writing, etc.

But honestly, it’s pretty bad at those. I’m not very good with technology, and I’ve tried several times to create an Excel sheet by giving it data, but it just doesn’t work. It spends hours processing only to produce a result full of zeros or just include two pieces of data.

It has many strengths that have nothing to do with those kinds of tasks, especially in how it handles information, expresses ideas, or understands what you’re asking for. As a user, I can see that with a few improvements, it could get much better.

Maybe they want to build a solid foundation first before continuing to improve it in the areas where it already performs well. That’s just my theory—who knows what direction they’ll take? Whether they’ll aim for an AI that interacts with people in a human-like way or push ChatGPT more towards becoming an unbeatable work tool.:woman_shrugging:

2 Likes

o3 is a game changer and I could not be more excited about it! Congratulations on it!

2 Likes

Honest feedback - stop doing stuff like skipping version numbers. It has the same smell Elon gives off when he calls his tunnel a “hyper loop” or a car assembly plant a “giga factory”. Very stinky salesman talk, and normal people can smell it a mile away. Contrary to the feedback you get here, from your most hardcore users and businesses using your API, there’s a very large and growing portion of the populace (I’d say a clear majority) who already hate AI for all the spam it’s being used to generate, and of course for the fact that evangelists keep promising it will take their jobs. Some care should be given to the presentation of this technology to the public, and hysterical tulip-mania talk like “AI will do literally everything and it’s already improved so much from three months ago that we had to skip a version number but no you can’t use it to see for yourself” is doing way more harm than good in the long run. Hopefully in 2025 there is a greater focus on delivery than promises.

1 Like

Yes, I saw it too → they’re going down Elon Musk’s road, and this pisses me off very much.



Well, I used to hate open-source LLMs, but what Hugging Face is preparing (an open-source course on GitHub to train small models on your computer) makes me think 2025 will be their year.

Google has changed its strategy → they’re playing it smart and understand the game. They even offer a FREE API to get users on board. If Google focuses on small models, it will be a major hit to OpenAI, because OpenAI seems stuck with a “huge LLM only” strategy to keep them out of users’ reach.

I’ve said it multiple times here on the forum: we need small models from OpenAI. But it seems their focus is only on one massive LLM, training it with new data from conversations until less intelligent people start calling it AGI → just because it’s smarter than the average not intelligent people. Honestly, I’m disappointed. I didn’t expect the greed to be this extreme from some people.

Now, others will step in and win the battle against huge LLMs, as everyone will focus on small agents that can be trained on personal computers. Huge models will only serve a small niche → like acting as a fancy main assistant in a multi-agent system network.

And where is the critic model OpenAI claimed to have?
Will o3 be critical, or just another robo-slave like the other models?

For those claiming o3 is the future of AGI or has reached AGI, let me ask you this:

  • If you train a parrot to give you better answers, does that mean it’s more intelligent, independent, or capable of growing and improving itself without human touch?
1 Like

How to say you didn’t watch the first 30 seconds of the video…

It is humbly explained.

4 Likes

It’s been their mission to achieve AGI. This is what they sold. :face_in_clouds:

I agree. It seems like OpenAI has given us the tools to start with a smart model, and then distill it down to what we actually need. Why bother though :rofl:

I’ve been very impressed with Gemini Flash. I have not been impressed with not knowing how much they cost.

I also agree that small models are the future. They need to be contained, & somewhat predictable. There’s truly an art to applying LLMs that nobody bothers to respect. It’s just a magic block box to most people. LLM as the main feature is the curse of businesses. They serve much better as building blocks to systems.

That’s as a developer.

As a consumer, I can see the appeal in having an all-powerful model that I can use for day-to-day tasks through a proprietary interface. I would be very surprised if most companies don’t end up with some sort of subscription to this type of service, and have it contain all of their documents.

It leads me wondering how RAG systems will function in the future, and how tightly intertwined they will be with these services. One of my biggest hard-learned lessons working alongside companies like OpenAI is that building systems that augment their models is almost always a fool’s game. It’s building structures on moving sand. But, this may be a rapidly decaying truth if it’s believed that we have hit the wall of “general-purpose models” like GPT.

And how steerable will it be?

Throughout all this time I have noticed a decline in steerability with models. I have noticed this from other reports as well. These benchmarks enforce a “single-shot” method where it’s “all-or-nothing”. It’s moving people into a destroy and rebuild mentality. Complete waste.

I imagine this is why it’s kept as a “reasoning tool” and not a main model. It does make sense that it should be used as such for one-off operations.

3 Likes

'Your tracking number has been received by the carrier. You will be notified of updates when they happen."

2 Likes

How to say you didn’t make it past the first sentence of my post.

nah i believe at this point it doesn’t really matter.
if you wanna trick it like this its not about safe terms of use more like about someone did something bad with it you cant avoid it in every situation.
i mean u already got all the other equipment to do the job.

its more important to make sure that nothing dangerous happens by accident

that makes the most sense, thank you