I know I’m in the minority here but for designing actual chat apps with tool calling, the “chat” variant models via the api have been much better than the normal flagship ones. At least that WAS the case with gpt-5-chat (5c) NOT with 51c and not 52c. I get that the chat models wont score as high on llm benchmarks and there’s not a good plain back and forth chat benchmark. So I’m guessing openai is focusing too much on getting better benchmark scores and not enough on real user experience because 5c just is just plain better than the new ones and that’s really disappointing. It doesn’t repeat itself as much, it follows directions better, it writes cleaner markdown that gets displayed better in the UI. There’s less syntax error when I ask it to display links and overall has much better common sense. So what gives?! I was excited to try 51c because I thought it’d be the same flavor as 5c but just better but no. It acts more like a normal non-chat model that dumps a bunch of text and other junk no one cares about. It so messy! I was then hoping this was just a bug that 52c would fix it, but still no! 52c is just as bad. I’ve tried adjusting the system prompt multiple times over and it’s just can’t get even close to the same performance as 5c. Has anyone else had this issue? I’m extra concerned because openai says to no use 5c anymore in favor of the newer ones. Just horrible.