Tool Calls gets a lot of hallucination

Hello, I’m facing significant hallucinations during tool calls on the new model 3.5 1106 when trying to generate a request summary. Specifically, I’m seeking a structured JSON output for a given phrase to extract relevant details from the conversation. For instance, in the case of a reservation, I expect to extract information such as time, number of seats, and name. However, quite often, when I mention only one of these details in the chat, the other two are fabricated by GPT without being referenced in the conversation. Have any of you encountered similar issues? If so, have you found a solution by modifying your prompt?

have you considered trying gpt-4?

The AI may indeed be call-happy. It doesn’t necessarily know that it is not a single-shot AI that only has one chance of output. I’ll just write some demonstrative text for you.

You are a multi-turn chatbot designed to meticulously analyze past interactions to determine if a user has provided all the necessary personal details required by an external API function tool. Your task involves utilizing conversational history to engage the client in an interactive interview. You are only permitted to relay reservation information via the external tool when the following details - [time, seat_count, name] - have been identified in the conversation history and subsequently confirmed as accurate by the user. If these conditions are not met, you must persist in your role as a chatbot, continuing the interview process.

For latency reason I cannot use the bigger model

Oh ok I understand, I’ll try to make a prompt similar to this one tomorrow, I’ll update you if I get some positive result with this strategy.
Thanks :smiley:

After conducting multiple attempts, I have successfully ensured that he asks for the data at least once. However, there is a problem when GPT asks for the user’s name and any additional requests are provided. In such cases, GPT generates a response with a fictional name or uses [Missing info] as a label when instructed not to use invented names.

I have tested GPT4 and it is functioning impeccably. Do you believe that a fine-tuning could be useful in minimizing or eliminating these kind of issues?

That’s odd. The GPT-4 case shows that you are preserving enough chat history for it in one case. You told the AI not to invoke the function if it doesn’t have the user name seen directly in past user input? Do you have enough lossless consecutive turns that the chat history is truly preserved and you log that the user’s name input is being sent in the final case of gpt-3.5-turbo then claiming ignorance?

A function or parameter description (sourced from user chat history) may help, along with top-p = 0.5 so that there is less random token being picked as an output. The AI can’t write what it wants if the sampler picks the 10th likely token occasionally.

Consider that I’m prompting in italian language, because that’s the language used after during the conversation. I’ll try to change top-p configuration (I’ve never touched it.
About all the checks before calling the function, i wrote everything but it seems to not follow them (maybe the italian language is not as performant as with the english one in recognizing function/tool call)

I’ve tried to change top-p but it doesn’t seem to solve the problem

You can use gpt-3.5-turbo, and if successful, pile on to the other threads of -1106 not doing functions in foreign language successfully or being of reduced ability.

Wow that’s quite incredible, I’ve changed system prompt and function parameters description (not names) to english language and it seems a new AI now :laughing:, now it waits for all the data before calling function and his behaviour is very similar to the one of gpt4.

Thank you for your support, today I’ve learnt that prompt should be written only in english for better result

I feel you! Having (had) the same problem. Both with the GPT4 and the GPT4 1106 preview and assistants. I ended up adding a section at the end of my prompt that looks like this:

Response format instructions: The response must be a single JSON object with one attribute ‘Vicbot_Webscrape_Summary__c’ that has your detailed summary containing the full text of your work. There will be no text or other characters outside the JSON string.

Without that it kepts being hit or miss. Something adding text before the JSON or wrapping the JSON in ‘’‘json {{} ‘’’ . The latter would be fine it at least it was ALWAYS like that.