Inconsistent results from Assistant

I’m trying to replace a Google DialogFlow chatbot with a chatbot based on OpenAI. I am using the Assistant API to collect a set of data from the user that will eventually result in an external API call to act on the user’s need. The data I want to collect from the user depends on the first user prompt that determines the user’s intent.

I have the assistant process the first user input and let it determine the intent from a fixed set of intents I give the assistant using a comma separated string embedded in the instructions. This step works fine and the assistant always finds the right intent. Now things start going wonky.

I now create a different instruction - based on the intent. In these intent specific instructions, I tell the assistant to collect values for a set of entities. As part of the instruction to the Assistant, I embed a JSON structure that contains a list of entity-names and example question for each entity. I tell the assistant to have a conversatoin with the user and collect all the entity values. I also tell the assistant to embed the name-value pairs of the values collected from the user for each entity at the end of each response. I tell it to continue generating the next question until all the needed entity values have been collected. When it has finished, I ask it to add a “completed” flag as true to the JSON.

The Assistant’s behavior is 50-50. Some times it behaves perfectly. Some times, it starts adding its own questions and completely ignores the entities I give it in its question generation. I’ve tried playing with the prompts to make it more deterministic, by telling it that it sometimes makes this mistake - and I tell it step by step instructions on how to avoid the mistakes etc. But no avail.

Any idea how to improve the determinism in the Assistant’s behavior? Is there a way to “train” the Assistant by giving it a set of correct interactions and needed response from it? Any other ideas?

What model are you using? Have you tried with a different model? Are all instructions in the Assistant instructions or part of the thread as well?

I’ve tried both gpt-4 as well as gpt-4-turbo-preview. The results are inconsistent in either case. gpt-3.5-turbo doesn’t get my instructionsa and I can never get the right results ever. Since I’m using the same assistant to detect intent as well as to process the intent, I largely use the instructions at the thread level. I suppose I can create an entirely new assistant for processing the post-intent-detection instructions. Do you suspect it will make a difference? I can try anyway, as I’m stuck without a solutoin as of now!

I would put all unstructions in the Assistant. But very detailed. And create headers each step. Be super detailed.

I have the same issue, using the same data and instructions I have more consistent answers from a customGPT made directly on ChatGPT than the assistant using the same model (GPT-4), I think it would be cool to just allow us to create threads on customGPTs instead of assistants.

I’ve been using it for a while now.
Happy to help if I can.

I’ve tried a number of approaches from being super direct and adding each instruction Vs keeping it light with high level goals.

I find the assistant will will sometimes ignore instructions and the experience provided is not consistent.

I’m coming to the conclusion that the assistant API is good if you want something very narrow, quick and need little control but if you want control around each response you should build a framework around the completion API.

Yep, i do most stuff with completions and call the assistant when you need that functionality

due to continuing and unpredictable inconsistencies in the behaviour of the api I have had to pause development on the things I was working on.