Using assistant, with one run. It actually runs several times and then I get the response

I have this part of code for testing:
for i in range(100):
gpt_thread = assistant_client.beta.threads.create()
message = assistant_client.beta.threads.messages.create(
thread_id=gpt_thread.id,
role=“user”,
content=content,
)
with assistant_client.beta.threads.runs.stream(
thread_id=gpt_thread.id,
assistant_id=assistant_id,
event_handler=EventHandler(),
) as stream:
stream.until_done()

sometimes the response I get does not relate to the start of a conversation and point out to some steps after. Seeing the thread_id output in the playground shows me assistant respond three times without getting any new message

If you have tools like code interpreter or file search on, the AI may invoke those before it responds. You would have multiple run steps of those internal tools, and perhaps them returning no relevant information and even the AI retrying what failed.

What happens depends on the assistant, along with default temperature that has it generating differently each time.

2 Likes

thank you for your reply, but no I don’t have a File search and code interpreter in my assistant

Is there a chance you are simply running the thread multiple times? What you are explaining isn’t clear.

If there is nothing more to write, and an input thread is run again containing the latest AI response, the AI may just repeat the same thing again (or in true completions, just “stop” as its output).

1 Like

As you can see in the code, I create the thread and send the default message to the thread. In some cases, the response I get is irrelevant to the default message and I can see in the playground that it generates a response several times, but I only get the latest one.

Is the model gpt-4o-mini, and the problem new with the last 24 hours?

Someone else here reported that AI model not observing the context, on images.

Set top_p in your assistant to 0.5, and then tokens that can produce errors with the actual formatting of response containers that would cause 500 server errors will be minimized. Even near zero if you just want to see and confirm almost identical AI outputs.

yes, it is 40 mini. We started seeing it several days ago.
sometimes it runs several times, then we get the response
sometimes we get this error:
error: Error code: 400 - {‘error’: {‘message’: ‘Thread thread_id already has an active run run_run_id.’, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

1 Like

The error indicates re-running on a thread with the same ID, though. That’s what I suspected would cause the symptom.

You might need to ponder the asyncio, fast Client() reuse, or the EventHandler reuse while streaming, where basically you have to trust the SDK is doing a good job of blocking and not spinning off processes that rely on the client object state not changing (I haven’t looked at any of this helper code). Where outside try/except and other loops are placed that might allow gpt_thread to be reused if there is a create failure, etc.

Just simple checks like a banned_for_reuse past thread_id list or such right before a run. Or not reuse client objects, and delete or destroy them. Idea:

import openai

# Dictionary to hold assistant_client objects
clients = {}

# List to keep track of thread IDs that should not be reused
banned_for_reuse = []

for i in range(100):
    # Create a new client for each iteration
    clients[i] = openai.Client()
    
    # Create a thread
    gpt_thread = clients[i].beta.threads.create()
    
    # Ensure the thread_id is not reused
    if gpt_thread.id in banned_for_reuse:
        raise ValueError("Attempted to reuse a banned thread ID")
    
    # Add the thread_id to the banned list
    banned_for_reuse.append(gpt_thread.id)

    # Create a message in the thread
    message = clients[i].beta.threads.messages.create(
        thread_id=gpt_thread.id,
        role="user",
        content=content,
    )

    # Stream the thread runs
    if gpt_thread.id not in banned_for_reuse:  # Double-check to avoid reuse
        with clients[i].beta.threads.runs.stream(
            thread_id=gpt_thread.id,
            assistant_id=assistant_id,
            event_handler=EventHandler(),
        ) as stream:
            stream.until_done()

    # Delete clients that are 5 iterations older
    old_client_index = i - 5
    if old_client_index >= 0:
        del clients[old_client_index]

# Clean up any remaining clients
for client_index in list(clients.keys()):
    del clients[client_index]

Just make sure this fits with everything not shown.

1 Like

I get your idea related to the error. However, it is different from the first issue I stated. I hit the run to send a message and the response I got from the assistant, is after several runs with not getting any input

How about pull down the entire contents of the problem thread with code and see there how many messages you have, run times, and the actual assistants messages?

Yes, that’s exactly what I did. I see several messages. The only response I get is the last one

Still, I have this issue, I run the thread only once through python SDK but when I check the threads in https://platform.openai.com/threads/ I can see there are sometimes 2-3 runs.
It’s inconsistent and make my whole app inconsistent