4o and o4 mini can’t even add 7 small numbers together. It made the same mistake 4x in a single chat.
"
- Swing States EVs (2024):
- Arizona (11) + Georgia (16) + Michigan (15) + Nevada (6) + North Carolina (16) + Pennsylvania (19) + Wisconsin (10)
= 113 EVs
"
Devs, y’all should focus on it getting simple math right instead of contest problems that follow a specific format.
Also o3 tends to think for 15 minutes and then fails to produce output. Too unhinged and unreliable. You should take notes from Manus AI, where the user can interrupt in the middle of a task to steer the output. Since o3 either needs followups to clarify like in Deep Research or a way to steer the thoughts cause rn it attempts to solve problems in one-shot and gives unsatisfactory answers and I either need to rerun or followup with no guarantee it’ll work. I don’t want to babysit these models, there needs to be better ways to nudge it in the right direction rather than having me overspecify (~400 words) the problem statement. Should be more agentic.
Also the app needs to be more consistent, we should be able to edit the text when we attach an image and vice versa (add an image when editing the text) also on android tablets it doesn’t allow me to close the sidebar and when I’m drafting a large message and accidentally click on the sidebar the whole draft is gone.
Also on the free plan the image upload limits are dumb, it shouldn’t count unless you actually press send. Claude and Grok and Gemini are much better on that.
Sorry multiple critiques in one thread but I had to get them all off.