Gpt-3.5-turbo-1106 is very slow

NV_Neo · November 20, 2023, 4:30am

the response format is set to json_object
I also have JSON output being mentioned in my prompt
Temperature is set to 0.2.

If it was a problem with the prompt, wouldn’t it fail all the time and it’s a timeout and not an error. In your shared sample, you set timeout to 5, what is the unit there?

TonyAIChamp · November 20, 2023, 4:31am

If it was a problem with the prompt, wouldn’t it fail all the time and it’s a timeout and not an error.

Actually, no. Please check above: Gpt-3.5-turbo-1106 is very slow - #12 by TonyAIChamp

In your shared sample, you set timeout to 5, what is the unit there?

Seconds

ken0ryu · November 21, 2023, 4:15pm

@TonyAIChamp Could you share the prompt that works without timeout?

My system prompt:

Reply in JSON format with the following structure:

{ 
  “title”: “title of the book”, 
  “author”: “name of the author”
}
// other instructions

model: gpt-3.5-turbo-1106
temperature: 0
response_format: {‘type’: ‘json_object’}

Still get timeout every few times.

_j · November 21, 2023, 4:49pm

Are you streaming the answer so that you can see the progress?

Because now you can’t see the progress every five minutes in the usage page to determine what you were charged in tokens, as the prior features were intentionally attacked, an obfuscating modification to existing functionality meant to harm the user and make ill-conceived products such as “assistants” unaccountable.

If you aren’t streaming the completion, then you won’t see that the AI model has gone nuts in a loop of nonsense tokens until it fills up the context length, until it times out on you without returning an answer.

Setting your own timeout will ensure you do not see the 4000 “/n/n/n” that you paid for. The likely result of a new AI model on the edge of breaking all the time, so much that it was “trained” to make json but still needs to be instructed in order to not produce garbage.

I would see if you can just prompt for the JSON output that you want the AI to produce, and NOT use the separately-trained json-output model that doesn’t work right, and which is invoked by the response_format parameter.

adaptiv · November 22, 2023, 12:48am

Yes, here my bug report which focuses on the excessive usage of tokens due to messages re-created over and over again.

TonyAIChamp · November 22, 2023, 6:25am

                You always reply in the json format with the fields:
                - ai_message: your message to the Human
                - status: assessment status
                - status_comment (very detailed comment to the status, not to be shown to the Human)

This is one of the examples. On some other ones I show actual json like you do.

adaptiv · November 22, 2023, 6:56am

I like this idea of providing the JSON structure. Might be more intuitive to the model than a bogus JSON output.

jan7 · November 22, 2023, 2:21pm

Hey, so i am currently creating batches and running requests async. Our company is t4 and im trying to get closer to the 1mio token per minute limit.

The issue im experiencing are very long response times for single requests in the async batch (several minutes). This currently exclusively happens for the new 1106 model. GPT-3.5-turbo-0613 does not run into this issue and i have a much higher throughput (i would gues roughly 4x?).

Now to my question, since the new model is slower. Will this change? And also are the new 1106 api prices also the new prices for GPT-3.5-turbo-0613, or is 0613 currently more expensive than 1106?

AI-Roguelite · November 27, 2023, 1:54am

Hello, I have a similar issue where I have a new use case that only 1106 can do well, but I’m unable to migrate to 1106 because about 20% of the time the responses will take over 20 seconds. For 0613 and 0301, this only happens 1% or less. What is the ETA for the fix for this?

Foxalabs · November 27, 2023, 2:39am

This will get better with time, more hardware and software refinement, at this stage it’s all beta and should be treated as such, while beta development can be difficult it does give you access to high value markets with little in the way of direct competition… at least to start with and the first to market advantage is often enough to carry products over the line for the long term.

_j · November 27, 2023, 7:06am

Those are feel-good words.

We only hope it would get better since introduction - like one might have only been able to foresee GPT-4 getting better since its initial release.

b0zal · November 27, 2023, 7:16am

work’s fine for me

video:

TonyAIChamp · November 27, 2023, 7:40am

I cannot stop being amazed how people are complaining about the magic wand they were suddenly given, because of the slighly wrong shade of the color of free socks they wish from the wand…

tijl.declerck20 · November 27, 2023, 10:40am

So this is still happening, I switched back to the old 3.5-turbo and it doesn’t have the problem.

sbaldino · November 27, 2023, 11:30am

Well, OpenAI is a company that offers its services and asks for payment, so I’d expect some ‘quality’ to their services.

It’s not just about complaining and whining. The earlier model does not have the slowness issue, but the newer one does. As we’re building commercial applications based on that, we’d like this problem to be fixed. Right now, with those delays, the model is simply unusable for lots of users (me included).

OpenAI is a commercial company now. If a service is substandard (and it is, I’ve never experienced those delays) then customers will complain, until it’s fixed. Looks normal to me

droid · November 27, 2023, 3:07pm

Whole Time I thought my code was the problem. switched back to the old model, and everything is okay.

b0zal · November 27, 2023, 3:11pm

don’t blame ur code only blame if its not correct a syntax, the issue are confirmed it’s openai issue not your code

Steno-Ian · November 27, 2023, 5:34pm

Adding a 10 seconds timeout seems to resolve it.

_j · November 27, 2023, 5:37pm

A ten second timeout will result in an error being thrown while the AI is writing a long response.

It might only work for you because the python libraries has retries built in and hidden from you.

derrickob · November 27, 2023, 5:46pm

Friends, I just confirmed: OpenAI released 3.5-turbo-1106 to just annoy Devs! I initially thought you people were just making noise over the issue because I was busy trying to defend the company by studying the Assistants API from the playground. The playground seems to be kinda hospitable to a user. The API is really hostile, eh? It’ll literally choose when to reply to a user, when it should even process your input tokens. Just released to drink input tokens without delivering results. I think we can all agree on the previous model anyone can decide to remain with it and continue sorting out their input tokens and contexts because 1106 is definitely drunk

Topic		Replies	Views
GPT-3.5 Turbo API response is slow API	20	12318	November 11, 2023
OpenAI Why Are The API Calls So Slow? When will it be fixed? API	103	54074	February 19, 2024
GPT-3.5 API is 30x slower than ChatGPT equivalent prompt API gpt-35-turbo , api	69	13811	November 30, 2023
Chat Completion API super slow and hanging API	8	2206	December 13, 2023
We proved the API is intentionally slow API	56	17669	May 2, 2023

Gpt-3.5-turbo-1106 is very slow

Related topics