Using assistant, with one run. It actually runs several times and then I get the response

mlops_dev · September 12, 2024, 4:19pm

I have this part of code for testing:
for i in range(100):
gpt_thread = assistant_client.beta.threads.create()
message = assistant_client.beta.threads.messages.create(
thread_id=gpt_thread.id,
role=“user”,
content=content,
)
with assistant_client.beta.threads.runs.stream(
thread_id=gpt_thread.id,
assistant_id=assistant_id,
event_handler=EventHandler(),
) as stream:
stream.until_done()

sometimes the response I get does not relate to the start of a conversation and point out to some steps after. Seeing the thread_id output in the playground shows me assistant respond three times without getting any new message

_j · September 12, 2024, 4:25pm

If you have tools like code interpreter or file search on, the AI may invoke those before it responds. You would have multiple run steps of those internal tools, and perhaps them returning no relevant information and even the AI retrying what failed.

What happens depends on the assistant, along with default temperature that has it generating differently each time.

mlops_dev · September 12, 2024, 4:30pm

thank you for your reply, but no I don’t have a File search and code interpreter in my assistant

_j · September 12, 2024, 4:35pm

Is there a chance you are simply running the thread multiple times? What you are explaining isn’t clear.

If there is nothing more to write, and an input thread is run again containing the latest AI response, the AI may just repeat the same thing again (or in true completions, just “stop” as its output).

mlops_dev · September 12, 2024, 4:53pm

As you can see in the code, I create the thread and send the default message to the thread. In some cases, the response I get is irrelevant to the default message and I can see in the playground that it generates a response several times, but I only get the latest one.

_j · September 12, 2024, 5:00pm

Is the model gpt-4o-mini, and the problem new with the last 24 hours?

Someone else here reported that AI model not observing the context, on images.

Set top_p in your assistant to 0.5, and then tokens that can produce errors with the actual formatting of response containers that would cause 500 server errors will be minimized. Even near zero if you just want to see and confirm almost identical AI outputs.

mlops_dev · September 12, 2024, 5:07pm

yes, it is 40 mini. We started seeing it several days ago.
sometimes it runs several times, then we get the response
sometimes we get this error:
error: Error code: 400 - {‘error’: {‘message’: ‘Thread thread_id already has an active run run_run_id.’, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

_j · September 12, 2024, 6:05pm

The error indicates re-running on a thread with the same ID, though. That’s what I suspected would cause the symptom.

You might need to ponder the asyncio, fast Client() reuse, or the EventHandler reuse while streaming, where basically you have to trust the SDK is doing a good job of blocking and not spinning off processes that rely on the client object state not changing (I haven’t looked at any of this helper code). Where outside try/except and other loops are placed that might allow gpt_thread to be reused if there is a create failure, etc.

Just simple checks like a banned_for_reuse past thread_id list or such right before a run. Or not reuse client objects, and delete or destroy them. Idea:

import openai

# Dictionary to hold assistant_client objects
clients = {}

# List to keep track of thread IDs that should not be reused
banned_for_reuse = []

for i in range(100):
    # Create a new client for each iteration
    clients[i] = openai.Client()
    
    # Create a thread
    gpt_thread = clients[i].beta.threads.create()
    
    # Ensure the thread_id is not reused
    if gpt_thread.id in banned_for_reuse:
        raise ValueError("Attempted to reuse a banned thread ID")
    
    # Add the thread_id to the banned list
    banned_for_reuse.append(gpt_thread.id)

    # Create a message in the thread
    message = clients[i].beta.threads.messages.create(
        thread_id=gpt_thread.id,
        role="user",
        content=content,
    )

    # Stream the thread runs
    if gpt_thread.id not in banned_for_reuse:  # Double-check to avoid reuse
        with clients[i].beta.threads.runs.stream(
            thread_id=gpt_thread.id,
            assistant_id=assistant_id,
            event_handler=EventHandler(),
        ) as stream:
            stream.until_done()

    # Delete clients that are 5 iterations older
    old_client_index = i - 5
    if old_client_index >= 0:
        del clients[old_client_index]

# Clean up any remaining clients
for client_index in list(clients.keys()):
    del clients[client_index]

Just make sure this fits with everything not shown.

mlops_dev · September 12, 2024, 6:42pm

I get your idea related to the error. However, it is different from the first issue I stated. I hit the run to send a message and the response I got from the assistant, is after several runs with not getting any input

_j · September 12, 2024, 6:45pm

How about pull down the entire contents of the problem thread with code and see there how many messages you have, run times, and the actual assistants messages?

mlops_dev · September 12, 2024, 6:49pm

Yes, that’s exactly what I did. I see several messages. The only response I get is the last one

mlops_dev · September 19, 2024, 2:06pm

Still, I have this issue, I run the thread only once through python SDK but when I check the threads in https://platform.openai.com/threads/ I can see there are sometimes 2-3 runs.
It’s inconsistent and make my whole app inconsistent

mlops_dev · November 6, 2024, 3:25pm

I fixed this problem with adding pol option 100ms

Topic		Replies	Views
Assistant API Repeat the same message Bugs	20	2941	March 31, 2025
Error running thread: already has an active run API	4	3050	May 29, 2024
Looped responses after 20~ messages API api , python , repeat , loop-feedback , gpt-4o-mini	10	216	January 4, 2025
Assistant API response missing code API gpt-4	2	1022	January 12, 2024
Vastly Different Responses (Assistant Playground vs. API) API	10	3812	June 20, 2024

Using assistant, with one run. It actually runs several times and then I get the response

Related topics