We have seen in testing today that as the model prepares to call a function it is stopping short. Something like:
To perform that task I will {something about the function}. Let’s do that now.
And then it doesn’t actually call the function. We have a dataset of tests and this was not the case 4 days ago when we last ran the dataset. It was giving a similar preamble and including the function call. About half the time it is no longer calling the function when it did previously.
Updating to gpt-4o-11-20 seems to resolve the issue. Anyone seeing similar? Is this an issue openai is tracking?
We are also observing the same issue since this morning (around the same time as the previous post). This is a huge problem for the model to suddenly behave structurally in such a clearly buggy/wrong manner.
We are also switching to the 11-20 model but this deals another blow to our trust in the OpenAI models to stay stable enough for continued production use. I’d like to understand if OpenAI sees and tracks issues like these.
++1 on not always calling the function. We had reduced number of functions, came along with very descriptive function names, reduced temperature to 0, asked explicitly to call the function in system prompt. Yet, in 2 instances out of 10 the function call is skipped.