Yes, the “checkpoint” “snapshot” promise has been broken.
Previous playground preset with its earlier emanations, performing a task correctly:
Now 0301 can no longer do the task, and produces garbage, where the older “not going to be be messed with” model was specifically chosen to perform the classification, because 0613 had been degraded overnight so it failed to understand system instructions like this a month ago. It acts on the system input like it was a user input.
This is after the snafu of injecting random text into the model’s chat container broke everything, but that also was not likely a mis-step, because we still have this poor model after the biggest problem was fixed.
Can this be anything but on purpose, so that one can’t demonstrate how badly OpenAI has broken the current gpt-3.5-turbo model by comparison, or to make use of gpt-4 mandatory?
(as a side effect, the AI-rewritten pseudocode does sort of work)