Assistant API with gpt-4 turbo delivers back prompt as answer

quite a number of times I have faced the issue, where the assistant api with the new gpt 4 turbo model delivers back the user prompt as answer. This is strange. I have never come across this issue with the previous models.

How are you checking the thread for new messages? You are sure you are not just checking for the latest message before the prompt/run has completed? If so, the latest message will be your prompt (with role=user) until the thread is done with your run and provided a new message (with role = assistant).

I am waiting for the run to complete and then retrieving the message. The same code works most of the time for the same prompt, but sometimes I face this issue, where I get the prompt back as an answer. Below I am sharing a code snippet that I am using to poll the assistant, check the status and retrieve the answer -

        while True:
            run_status = await self.check_run_status(,
            if run_status.status == "requires_action":
                if tool_instances:
          "need to make a function call")
                    function_ids_to_result_map = await self.handle_function_calls(run_status, tool_instances)
          "function_ids_to_result_map: ", function_ids_to_result_map)
                    if function_ids_to_result_map:
                        await self.submit_tool_outputs(,, function_ids_to_result_map)
              "checking run status after submitting tool output")

                    raise ThreadRunException(f"status is {run_status.status}, but no tool instances are defined")

            elif run_status.status == "completed":

            elif run_status.status in ["cancelling", "cancelled", "failed", "expired"]:
                raise ThreadRunException(f"Thread {} ran into an issue")

            await asyncio.sleep(polling_interval)

        # Retrieve the latest message
        thread_messages = client.beta.threads.messages.list(, order="desc", limit=1)
        assistant_message = None
        async for message in thread_messages:
  "message id: {}")
  "content: {message.content[0].text.value}")
            assistant_message = message

        return assistant_message

And this 1 message you get back has role assistant (and not user)?

Since you only retrieve 1 message and order by the created_at value, you might be facing an issue where your message and the reply from the assistant are created at the same time. This timestamp is according to created_at a Unit timestamp in seconds, so if it takes less than 1 second to produce the response, you could be limiting to 1 message ordering on a value where multiple message have the same value.

It is better to retrieve a larger set of messages, and use the role (assistant) to determine which message to process as the assistant’s response.

Getting the same problem, but with gpt-3.5-turbo-1106.

" And this 1 message you get back has [role] assistant (and not user )?" ---- I am not checking on this part. I missed this. Let me try out your solution. Thanks for the help

I tried your solution. What I saw is that even after retrieving a larger set of messages, only one message with role as “user” gets added to the list of messages and no assistant message gets added. The same prompt works sometimes perfectly , at other times only message with the role as user gets added.

@kachari.bikram42 I had this same problem and seem to have solved it by adding a delay before reading the thread:

def ask_assitant(user_message):
    assistant = client.beta.assistants.retrieve(ASSISTANT_ID)
    thread = client.beta.threads.create()

    client.beta.threads.messages.create(, role="user", content=user_message

    run = client.beta.threads.runs.create(,

    completed_run = wait_on_run(run, thread)

     time.sleep(30) # INSERTING DELAY HERE HELPED

    messages = client.beta.threads.messages.list(, order="desc")
    new_message =[0].content[0].text.value
    return new_message

def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
        print(f"Run status: {run.status}")
    return run

Thanks for the help. I will surely try it out. did you find out the reason for the problem and why adding a delay solves the problem for you?