It looks like GPT-4-32k is rolling out

curt.kennedy · May 21, 2023, 2:44pm

OK, another update on got-4-32k speed through the API.

IT’S 4x faster!

Output tokens per second in gpt-4-32k: 20
Output tokens per second in gpt-4: 5

Just started benching this, but that’s what I’m seeing. Good news for people that need lower latency above all else.

N2U · May 21, 2023, 3:32pm

I hope you’re correct, but it may also be a server load thing, let’s see what happens when more people get access

Neoony · May 21, 2023, 6:12pm

I guess it might be that, when you use the API you are accessing it from your own internet, from your location to OpenAI, but when you are using chatGPT or playground, you are making calls from OpenAI servers to OpenAI servers (maybe even local network) and the website is only sending you the output.

I guess that might make for some differences.

info.deckly · May 21, 2023, 7:02pm

Anyone else got access to 32k? I’ve been invited to gpt-4 two days after the announcement

SynduAdmin · May 21, 2023, 8:31pm

I am really hitting the limits time and again and waiting for the 32k.

bruce.dambrosio · May 22, 2023, 4:47pm

I’m a bit late to this conversation, but: Would you have thought up front of providing the information the model prompted for, if it hadn’t asked for it?
I’ve found @N4U requirements-gathering methods enormously helpful.

bocchesegiacomo01 · May 24, 2023, 2:58pm

it means it can have 8k token in input+output, but not 8000 just in output. The output is limited at 2k in the playground, i don’t know if even in the api

bocchesegiacomo01 · May 24, 2023, 2:59pm

wow that’s super cool, i think even faster than 3.5 turbo, that outputs about 7-8 in my case via API

N2U · May 24, 2023, 10:57pm

This is great information:

If we’re benchmarking tokens per second we might as well do the same for humans

So i did some digging and found the study “Oral Reading Fluency of College Graduates: Toward a Deeper Understanding of College Ready Fluency” it suggests that college graduates can read somewhere between 138 and 287 words per minute.

When converted to words per second, this is approximately 2.3 to 4.78 words per second. Assuming that one token is equivalent to 0.75 words, the reading rate in tokens per second would be approximately 1.73 to 3.59 tokens per second.

Link to the full article here.

curt.kennedy · May 24, 2023, 11:03pm

But the impressive thing is that this isn’t reading rate, it’s writing rate! The reading rate for GPT-4 appears to be nearly instant, I’m guessing 5000 tokens per second or so. So it literally crushes humans.

But in responding, or writing, it beats most humans hands down. I’m guessing my response here took me maybe 1 or 2 tokens per second at best!

N2U · May 24, 2023, 11:12pm

Indeed, I should probably have made that a bit more clear

Yep what I was thinking was:

Do we really need GPT to output faster than humans can read?

jwatte · May 24, 2023, 11:53pm

The problem is one of latency – there may be post-processing (for example, moderation!) that needs to happen before the text can be presented to the user, and that post-processing might not even be able to start until the full generation is complete.

N2U · May 24, 2023, 11:59pm

Output rate and latency are two different things

Output rate refers to how quickly something is produced, while latency refers to the time delay or waiting time before something is received.

SomebodySysop · May 25, 2023, 1:01am

Is anybody currently using GPT-4 actually getting 8K now?

I ask because of this issue: GPT-4 Token Limit Reduced?

curt.kennedy · May 25, 2023, 3:38am

Using the paid API, it’s fine:

'object': 'chat.completion', 'created': 1684985539, 'model': 'gpt-4-0314', 'usage': {'prompt_tokens': 4555, 'completion_tokens': 411, 'total_tokens': 4966},

jwatte · May 25, 2023, 4:25am

I thought I explained how output rate puts a lower limit on latency in many use cases.
Was that somehow unclear?

Neoony · May 25, 2023, 9:32am

Not everyone is using the GPT output to be read by humans
Sometimes GPT output can go into chains where you use the output for another input. (just like autoGPT for example)

Or output code and run it, is another example

You could also make GPT make some decisions and execute commands.

N2U · May 25, 2023, 10:04am

Good point @Neoony

I’m not saying faster isn’t better, it definitely is, I’m just saying that comparing GPT’s output tokens per second against human reading speed may be relevant when determining when to add more server capacity.

Mixing output rate and latency is not helpful in this case, the output rate is relevant when benchmarking the model, the latency is relevant if we’re benchmarking the networking.

7041521 · May 25, 2023, 1:21pm

I want a gpt4 api to complete the testing of physical robots, who can help me? Please email me

curt.kennedy · May 25, 2023, 1:36pm

In my primary use, humans are in the loop and read/edit the AI response. The AI does 90% of the drudgery work. Here there is no stringent latency requirement, since it could take several minutes for the human to get to the response. So to @N2U’s point, the output rate being faster than a human can read applies here.

However, with AI agents, and machine-to-machine interactions, I can really feel the latency kick in. So far, nothing “mission critical” being done here, but I can see that if you have an app and an impatient user on the other end, this could be disastrous for your app. But I’m curious how many people here are in this situation, and what is your use case? And is it “mission critical”, or more of a hassle.

Topic		Replies	Views
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4470	January 26, 2024
Is the GPT4 api actually this limited or am I doing something wrong? API	13	1499	December 13, 2023
Prompt Fatigue Question For API Calls Prompting gpt-35-turbo	24	497	January 25, 2025
Need Help With Prompts? Ask me* Prompting chatgpt	149	19229	February 6, 2024
How to confirm that you got the correct value from a text other than repeating the same prompt over and over API	39	889	September 1, 2024

It looks like GPT-4-32k is rolling out

Related topics