Thanks for the response.
Although I can agree that 4.5 > 4o in one-shot, the size difference doesn’t justify the usage.
I’m finding it harder and harder to find people who are grounded (this is not a slight to your post). Not manic, or misguided. It’s scary how increasingly volatile this technology, the world, and people’s perspective are becoming, but that’s a whole other conversation.
Agentic systems are incredibly powerful. The massive size, and therefore pricing difference means I can run numerous LLM queries. Initialization, numerous iterations of refinement, at a fraction of the cost of a single gpt-4.5 query.
More control, better results, at a greatly reduce price. The only winning factor in this case is latency for 4.5, but there is too much lost to justify it.
From your information I would counter them with:
- A chatbot that can “understand implications” better is not a necessarily good chatbot. A requirement of understanding implications better is either a result of a public-facing chatbot, or poor prompting
- On that note, giving every single detail is essential, not a deterrant of an LLM. What’s more important is that the LLM can remember and utilize each single detail
From trying it myself, and monitoring other people’s usage I can only conclude that the model is really stuck in “no-man’s land”. The main benefactors are people who are not willing to explore and implement agentic systems.
Maybe there is use in things like public-facing chatbots that require a “human touch” in the response. Yet, I would be very surprised to see that a clever combination of model usage couldn’t accomplish the same at a reduced cost. I could only see “latency” being an argument, yet, in my experience people are happy to wait for a better result, not a fast subpar one.