O4 mini high, a vision champ

Compared to all other models for "plus " subscribers( i am one) this is the first one that can make sense of what it sees and does not hallucinate about what it sees.

Most of the time . Which actually makes it somewhat practical. No matter the hype of the other openai and non-openai models , most of them are effectively blind . Unless it is something simple like ,“oh here is a red rose” , they will convolute the truth out of recognition if you press them a bit more on what they are seeing and what it is happening.
And it takes forever to produce these false answers.
Instead of this it will take only a few seconds to a few minutes for o4 mini high to reason visually, and actually produce a correct answer 50 percent of the time or more.

That may not be the desirable, but it is huge.

1 Like

hm, gpt-4o vision seemed great in my opinion

Don’t get me wrong , it will say the basics in every day tasks, untill you stress it a little bit more and realize it is not aware of what it sees .
For example give it a chess puzzle . In fact not on only to 4o , to all before o4 mini high.
For starters it may say something nonsensical such as see a piece on a square it does not exist. Until it reaches a solution it will have convoluted reality so much you will start wondering if you actually saw something wrong in the first place.
No comparison. The others outputing a convoluted lie after 10+ minutes of agonizing effort , o4 mini high almost always producing the right answer in a few seconds to a few minutes.

People have spent too many hours on coding, forgetting that after a while it is not the AI that is responsible for the improvements but the engine.
Or seen from a different perspective , it is still the AI but like the human savant that takes the nobel prize in physics , still not able to write a decent essay or draw a decent drawing.

The difference between various kinds of engines and neural networks , is that they can actually reason on available data. So, the worth of an LLM, that sees a chess position,understands it (and does not see ghosts) is much higher than stockfish , even though of lower elo. Of course that’s partly correct because of what stockfish became when its developers were shamed by the performance of alphazero.(aka not your typical game engine incorporating neural network processing).

From this point of view o4 mini high leaves the rest of the models eat its dust.

I feel like I’m taking crazy pills. On two of my internal vision benchmarks, o4-mini is performing worse than 4o.

This one is for analysis of drone imagery of oil and gas equipment. Another case was extracting information from piping diagrams.

1 Like

Well o4 mini high is still lacking. It is just not such a failure compared to other models , when seeing a scene and must also have situational awareness.

I don’t know. Maybe we just need a neural network trained to see and then integrated into the llm. Maybe it has already been tried.
However you see it though, vision is suffering.