Open AI Assistant generates multiple messages even though is instructed to respond with just one

We build a sort of AI agent where Open AI Assistant is able to execute commands in a context as per the user’s request. We instruct it to react to the user’s input and come up with a command it needs to execute next. We expect it to generate just one message, and instruct it to do so in multiple ways such as:

YOU EXECUTE NEXT COMMAND ONLY AFTER YOU RECEIVED A RESPONSE TO THE PREVIOUS COMMAND.
YOU CANNOT SEND MORE THAN ONE MESSAGE AT A TIME.
Respond to the message you received with just ONE message according to the protocol defined above.

However, in most cases, it generates multiple messages as a response. See the screenshot from Open AI Threads page below:

Sometimes it is the same command executed N times, sometimes it is different commands.

Sometimes it goes even further, sort of “hallucinating” and “coming up” with a fake response it could have received from a user, which it then responds to :man_facepalming:.

Anyway, this is frustrating and bad because:

  1. it takes a lot of time to generate a response
  2. we still pay for all the messages, even though we need just one

Has anyone dealt with a similar situation?

We managed to deal with it by ignoring all the messages after the first one and then “pulling back it into reality” by recreating the thread, and are also considering using a simple completion mechanism instead of assistants.

But maybe there is a better approach. If all assistant cases are like that, then maybe it is simply easier to manage the context and the conversation yourself.

Thanks

What model are you using?
How long are the threads?

Model: gpt-4o
Threads are not long - 25-33 messages long