Today has been a very tough day.
Our agent that runs on GPT 5.4 started acting out so badly that we had to literally shut down our agent. Being completely offline was better than whatever was happening.
Been running evals all day, massive regression with no apparent explanation.
UPDATE:
- After using 5.4 since launch, we just changed it to 5.2 and everything is back to normal.
- One interesting behaviour to note: we make every call using flex : true. It usually takes ages, errors many times. However, today these have been running as fast as if we weren’t using flex at all.