I am experimenting with Assistants API using model gpt-4-1106-preview. The performance is significant slow than what shown during the DevDay. Any pointer to improve the performance for run especially for submitToolOutputs?
Simply down to load on the system, there are millions of new users, it’s the reason the Plus membership signups have been paused, it will improve as more hardware is added and software optimisation are made.
I wonder if there will be a performance difference when using an Enterprise Account versus Plus account? Do you have some information about that?
From Sweden, I have high performance in the morning and worsening from about 3 pm. I guess it is a load issue, indeed. Possibly too many users in oversee etc.
It is super slow still to this day. Could it be becuase they require polling of the endpoint to see if there is a response? I mean i can only imagine that people are spamming the heck out of the endpoint waiting for a response. It really doesn’t make any sense, I can’t imagine the load it is putting on their servers. I personally use a 4 second pause between polling (not ideal) and I am still getting like an average of 30s response time
Well, they really called for the loads themselves. Just imagine if they had emphasized releasing Assistants API with webhooks? You seem to be comfortable with 4s. I was on 2s I still got reply in about 10s+…intentionally pulled to 0.5s and still API was taking things too personal in replying when it wanted…not when you want. If you’ve been with the 0613 or any lower versions you’d definitely run back to Chat Completion. The response time is unbearable. I gave up on the API overall, that was because my code method catered for all instances of the runs,steps, and tool calls all combined into a fully working code where I didn’t miss anything. After all, if a developer isn’t using the steps endpoint some outputs are missed. My code was checking every single step being created to completion before going to the next one. But the response speed just discourages. Do you know what happens if you don’t have those (X)s pauses? You get penalized with an error code for being rude to the endpoint.
In simple terms: Allow me to drink your API tokens and money but don’t blame me for taking time to internalize and give you back the response. And you should be the one to come back for it. If you come too soon I’ll also fire you.
Well put, but does anyone know why? Cash grab, or bad coding or cart before horse or??? WTF is going on here?
If I put it in simple terms too let’s accept that the Assistant API is currently in BETA but the version released is v0.0000000001, the version meant to test your anger levels.
I’m still wondering when we are being charged. Each time we retrieve the run status? When we have to list all messages to get the most recent response(s)? Both? More? This is totally confusing. Also, upon my testing and using the v4 turbo preview in both chat completion and assistant, it is clear that the actual model for each is not the same. I would expect variations in responses sure, but v4 turbo preview in assistant does bold styling and the chat completion one does not. Not a big deal, but shouldn’t the same model be outputting in a similar way regardless? So much stuff going on here that doesn’t make sense, and we are paying for it! I really don’t like that we are mostly hung out to dry and this company just does whatever it feels like. It is borderline unethical to charge people to beta test this product, especially when results are not consistent.
Any news/update/hope for the bad performance to get better any time soon? Any production version coming out soon?
OpenAI staff recently hinted at some updates coming out in the near future:
As an additional observation: assistant performance in retrieval mode significantly slower in comparison with asking the same question via ChatGPT. I giving instructions to provide answer from uploaded document if available, and provide answer based on common knowledge (like ChatGPT) if not found in the document. Uploaded document quite small/simple, and openAI claims information is indexed. It shouldn’t be loaded with each request. I can’t see a reason for significant delay