I’m using gpt-4o with tool calling. I’ve recently noticed that it will often indicate that it called a tool in the textual response, but did not actually any tool calls. I’m trying to figure out prompting to ensure this doesn’t happen. For example, if a user asks “Send me a happy birthday email” and in the api submission, an email tool is provided, it might respond with “Ok, great, I’ve sent you a Happy Birthday email!” without actually calling the tool. The user then responds with “Hey, you didn’t actually send me an email”, it will then correctly fess up to its mistake and respond by calling the tool. Anyone else experiencing this? I have added system prompt language that indicates that it should review all responses to check that any tools referred to are actually called, but this doesn’t seem to help.
Yep, this was day-1 hallucination from this model training release. Pretending to do things it cannot do, or even claiming its output had performed the impossible being requested.
With less emergent improvisational ability, you just get an AI that writes the plausible, like a small base model does.
However, in the case of not invoking actual tools, that seems to be a recent and often reported issue with the 2024-08-03 version plus mini, and silent changes being made, along with more being exposed with the switch of the alias. Any chat history especially will distract from what tools it should use, even what schema it should output.
Switch back to the May 2024 gpt-4o version, see what improvement you have there - then just keep switching back to gpt-4-turbos, gpt-4, even gpt-3.5-turbo.
That fixed it…thanks! I’m guessing this will be corrected in the next version?