O3 answers getting weirder and weirder

I’ve been finding o3’s answers less consistent and less reasonable compared to 4o.

For example, I asked a straightforward question comparing the benefits of a Microsoft product versus AWS and Google alternatives. The response included duplicated information (which it acknowledged) without any valid reason. In another part of the answer it even struck through part of its own answer and replaced it with different info — which felt messy and unjustified:

In another case, I requested five simple science/tech jokes. It gave me two actual jokes and three that were nonsensical. When I asked o3 to explain why the jokes were funny, the reasoning made no sense. I then asked 4o to explain them, and it replied that the jokes didn’t seem to follow any logical pattern.

Initially, o3 felt like a step forward from o1, but over the past few days, its quality seems to be declining.

Is anyone else finding o3 a bit off lately?

1 Like

O3 is a weird model to be sure. It seems smarter but much lazier and will simply make things up to answer the question.

If you need anything factually correct you should steer clear.

1 Like

The current batch of models has mostly been negative experiences for me.

I’ve been increasingly using Claude or Gemini which are reliably giving higher quality results in one or few shots.

2 Likes

I’m almost all 2.5 pro these days myself.