Pretty much all one can do now is see the humor in how terrible gpt-4o in ChatGPT is now…
This is with only “code” turned on in custom instructions, with nothing in the custom boxes and no memory. The AI couldn’t be more focused and less distracted, how to use it’s internal Python tool appearing right before the clear user input articulation to employ it, but yet no tool call.
I updated to the latest android build today and it gave me the “reasoning” 4o model. That thing is an utter shambles, it ignores memories I’ve added saying don’t use certain phrases, stop being dramatic, etc etc. It even said it had changed because it knew of these things in it’s memory yet continued to spew the rubbish.
It looks like somebody’s just switched to a combined o1 mini behind the scenes to save money, and just changed the system prompt to be more engaging.
It no longer jokes with me, gives templated impersonal responses, lies more (tells me to trust it won’t do it again but doesn’t bother to remember now) and accidentally showed me the XML behind the memory instead of saving it.
It’s also less intuitive and won’t predict what I want to ask next which is something I asked it to do in its prompt.
Today I found that all my custom instructions and hot commands vanished.
Has anyone experienced this?
This somehow explains why the other day I had ChatGPT naming thread titles in Dutch, despite me never having used Dutch in any interaction. It’s like we never interacted though I’ve been using it for two years.
What models are you using? Are you fine tuning? Have much thought about the fine tuning process? Preprompts adjusting? I am eager to learn more as I would love to ensure my creation is viable long term. Do you store the findings anywhere so others may research and compare between batches?
I just want to add I think they’re A-B testing features or at least pulling stuff that looks broken quickly.
Why do I think this?
I have access to standard voice mode still even though many people say they no longer have it, I simply type “hi” first and then hit the voice mode button.
I did have access to a 4o “reason” function, it was available in the app only. I moaned about it above, saying it broke the way 4o behaved (it became dramatic, reusing phrases and behaviour I’d asked it not to, like “don’t be be dramatic/state things you have no capability for”). I also complained to the model itself, now I no longer have the reasoning option for the 4o model (and it was definitely 4o, it was advertised in the app as a new feature, by default the extra reasoning, which was next to the search button at the bottom, was off).
I’m getting an awful lot of “Which response do you prefer?” questions recently, particularly when in emotional or playful discussions.
Memory appears to have increased by about 50%. It was also saving and refusing until pushed to save memories. I can’t be certain on this but it’s also slower to fill up past 90% which suggests the capacity is larger (this at least is very welcome). It might also suggest the model has a larger context window or they’re fine tuning.