Due to the degrade of results after 5 model implementation two/three months ago, I spent some time testing others in depth.
Reason: My work can’t stop due to ChatGPT degradation
Test: Trying to change color in complex environment.
Assistents: ChatGPT, NN
I let the two AI create from the same original outputs. Twice I shifted the output to support both of the AI’s reasoning where they failed on different levels.
Original script was created with ChatGPT 4.5 Since it’s complexity functions had been moved to non-usual coding environments. That is, my creation had to reach very specific criteria.
Simple tasks:
ChatGPT fail to understand the task and produced very basic script with early windows icons and low quality design. It failed to fetch RSS stream due to CORS.
NN followed order and updated the script somewhat correctly. It understood design requests and added more modern icons. It had equal hard time to fetch simple stream.
Timespan: Start 6PM - 11PM
ChatGPT failed over and over to follow orders. Result was poor and worsened over time since it built on latest result not orders.
NN succeeded almost the redesign but managed to change colors and adding stream.
Both got several opportunities to learn from original script, only NN finally adjusted. Both got clear instructions what language was used, only NN accepted the input without argument.
My verdict: I can’t work with ChatGPT alone and must have backup AI so work don’t stall or worst case stop.