Gpt-3.5-turbo-0613 starts to misalign after a few conversation iterations

Had this problem today. I’m using gpt-3.5-turbo-0613 to support a chat conversation for a QA system. It is supposed to call a function is case a question is made. The problem is that after a few iterations (2-3) I start to see some misalignment. The model start to generate text instead of calling the function that is supposed to call. I think is because the previous iterations start to work as in context learning and bias the model to generate text. Any ideias of workaround this issue?

It’s a tough one, I suspect you’re correct in the assumption that the more data is in the conversational buffer the less there is a priority on the function calling aspect, if you can get some hard data on this it would be SUPER useful if you could post it here, so everyone can get an idea of what the limits are, and possible solutions and work arounds.

As a starting point I see perhaps looking at the API library source and seeing what text gets sent to the model for function calls, and maybe repeating that text every 3 messages? something along those lines should help, My guess it’s a keyword like “[Function]” and then perhaps a json string like the one you use to specify the function definition originally. (I’ve not looked so this is a guess)

Ok, thanks. I see. Yes, perhaps some tags or keywords to signal that the previous iterations a function was call might work. However, not sure if then it starts to thinks that it needs to generate those keywords and json strings…

I’ll try (tomorrow) to replicate an example and share it here.

I did a few more tests one this issue today and I think I found a workaround that gives better results. I changed a bit the main flow of the conversation and now instead of doing a traditional back and forward conversation, I work in a single iteration, where I show in the first turn the full conversation as it was the conversation from two users. In this way, the model acts always as it is its first iteration.

1 Like

Yes, that’s a good idea! I have started to use “reminders” to consider the whole conversation when I expect that the context will get lost which in your case would translate to a reminder to use the functions when necessary. The prompts get a bit more complex but it’s working better than relying on the model to consider the whole context window everytime by itself.

Hi, could you explain how to show the full conversation in one turn? Did you mean putting all the previous back and forward conversation in a role? Thanks!

Sure. Currently I’m simply doing something like the following:

messages = [{"role": "system", "content": SYSTEM_PROMPT}]
conversation = []
for user_message, bot_message in enumerate(history):
    if user_message:
        conversation.append(">User: "+user_message)
    if bot_message:
        conversation.append(">HelpAI: "+bot_message)
conversation = '\n'.join(conversation)
messages.append({"role": "user", "content": first_prompt.format(chat=conversation)})

Where I prompt the model to guess what will be the next message from the user HelpAI.

Thank you for sharing!! I am going to give it a try. I also had the problem to keep the Chatbot stay in character after a few conversation iterations. For example, if I ask it to tell me a joke, the first time it will reply it can’t answer… blah blah. Then I ask it again, it will tell a joke. There are some other problems like this. It seemed to be hard to follow the instruction after a few conversation iterations.

I noticed the exact same thing and in my case the solution has been to intelligently trim the history if it seems like the user is moving on to a new unit of work.
My app allows calling API functions like machine translation between languages, and it’s meant to receive repeated input sentences in foreign languages to process.

Between each sentence, I ask chatGPT whether the input is a new sentence (if so, trim history), or whether it’s a question regarding the previous sentence (if so , keep history).

Seems to work quite well, and trimming the history is preferable also from a token count standpoint.