At the end, is GPT-4-turbo 128k or 4k?

GPT token limits documention are super unclear. Please give us just a simple clarification

In Turbo 3.5 we have 16k
is it mean’s input + output maximum can be 16k
in GPT-4 we have 8k it means input+ output maximum can be 8k
( BTW in practice, if AI is on good mood it will return maximum 1-2k )

Now we have GPT4-Turbo
Title is 128k Context ( What the hell is context ) and in playground you can only input 4k
it means it’s more limited that Gpt-4 normal 8k? what about 32k?
( is 32k still available? I requested many times but I still have no access to that )

how much of this 128k is for input and how much for output?
just give us simple answers!

First: 128000 = 125k

Which is the context window length, the shared area for token calculations used for both what you place and what is generated.

OpenAI limited the amount you can get back on the latest models as a response. That’s where the 4k comes from.

The max_tokens setting on API sets the response length. The slider shows how much you are able to dedicate to just getting the response.

witch model in real life can generate larger response gpt-4-turbo or gpt-4

If you want models not pretrained on extreme curtailment of output, you would go the models gpt-4-0314 or gpt-3.5-turbo-0301.

The completion endpoint models also have less tendency to wrap up a response prematurely.