GPT 3.5 and GPT 4 Turbo 1106 Question for Output Context Length

raghavgrover013 · January 23, 2024, 7:18pm

Hi,

I have a question, we’re currently working on a LLM based application using GPT 3.5 Turbo 16k and GPT 4 32k with Enterprise GPT(via Azure Open AI service).

I have read the models like GPT 3.5 Turbo 16k 0613 will be discontinued in June 2024 and recommended replacement model is GPT 3.5 Turbo 1106 which I read has default input context of 16k tokens but output context of 4k tokens. Will we have output context of 16k also available with this model and for GPT 4 with 32k output context?

Having a large output token length of 16k and 32k is very important for our use cases and I have not find any documentation that enterprise customers will see similar models like GPT 3.5 Turbo 1106 16k which has 16k output token max length and 32k for GPT 4 1106??

Please clarify on this.

Diet · January 23, 2024, 7:44pm

No, GPT-4-1106 also has a 4k output limit.

The only hope we have is that microsoft pushes the discontinuation of the 03 and 06 models out again.

When they cancelled davinci embeddings, they offered reimbursement for re-embedding everything.

Maybe they’ll come up with an “upgrade path”, but we now know that they can and will replace good products with worse ones.

raghavgrover013 · January 23, 2024, 9:12pm

But in some use cases we need larger outputs which has been working well, say for example you’re generating a test script with 30-40 steps, that is not going to work with 4k token limit. I cannot tell end users to divide that to multiple test scripts if it relates to one function, this is just one example.
Our use cases that we have built are proving us much efficiency and having just 4k token limit aint going to cut it, I hope they give us larger output context for folks using EnterPrise GPT and I assume a lot of other people may have to change their implementation if they don’t with Assistant APIs with some sort of chunking and collating.

Diet · January 24, 2024, 4:18pm

There’s a workaround, but it’s kind of a pain and expensive:

step 1:

we need a test script for bla bla bla

could you generate an outline of the tests should be done? for now, they should all throw with placeholders that throw notImplemented, we’ll implement them later.

step 2:

we need a test script for bla bla bla

here’s what we have so far:
test 1: 
 not_implemented
test 2:
 not_implemented
test 3:
 not_implemented
...
can you write me some implementations for cases 1 and 2? please give them to me as diffs or something

…

step 3:

we need a test script for bla bla bla

here’s what we have so far:
test 1: 
 test impl a
test 2:
 test impl b
test 3:
 not_implemented
...
can you write me some implementations for cases 3 and 4? please give them to me as diffs or something

etc.

raghavgrover013 · January 24, 2024, 4:45pm

But again, that needs severe medium to complex changes in stuff that’s already deployed to Production. The model changes and updates should be aligned to provide improvements over current models and not limit them. While I appreciate the input context increasing but to me 32k was enough and 128k is plenty enough, but having at least 16k and 32k token limit for output is required for many of our use cases that we have , and having to re design all of this will cost money and resources.

I am pleading to Open AI to see if they can still provide these output context token limits for enterprise customers ( if not all).

Diet · January 24, 2024, 5:50pm

yep

unfortunately that didn’t do anything for davinci embeddings either.

they just killed it

_j · January 24, 2024, 7:11pm

As you get deeper into developing your application, you’ll likely find that the current AI models will be very reluctant to even produce 1000 tokens of writing. It will only extend the text in a processing situation where it must replicate an input, and even then it is likely the AI finds a way to inexplicably wrap up the output before hitting any max_token limit you set yourself…

Limitation on token production by trained behavior.

Diet · January 24, 2024, 7:50pm

That’s also true - it will give you 5 or six, and then insert a comment saying

// put the remaining stuff here

and close the schema

raghavgrover013 · January 24, 2024, 8:22pm

That will have a good level of impact on IT companies using Generative AI to bring efficiency into work to produce first level drafts that takes a lot more time. We are already able to produce a lot today with GPT 3.5 Turbo 16k and GPT 4 32k - think about a lot more complex code generation with high quality prompts in one to two attempts itself, think about SDLC use cases where a lot more content generation is required, I am hoping companies will put some pressure on this ask and something may change in regards to this. Anyways I won’t speculate what they might do or not, but I just wanted to ask if I was missing to see if they had larger output models and apparently that’s not the case.
If we have to change designs like these every 6 months to 1 year, then it wont be sustainable much and companies will have to re-think the approach we take while building apps with LLms

Topic		Replies	Views
GPT-4 128K only has 4096 completion tokens API gpt-4	8	27875	December 19, 2023
GPT 4 Turbo is limited to 4K? API gpt-4	15	14599	December 15, 2023
GPT4 API isn't producing completions as well as 3.5 16k API gpt-4 , api , gpt-4-turbo , gpt-35-turbo-16k	3	493	May 13, 2024
Gpt-4 32k vs GPT-4 Turbo api + Legal advice on using API gpt-4 , chatgpt , api , prompt-engineering	10	12166	March 1, 2024
Test new 128k window on gpt-4-1106-preview API	28	18998	November 19, 2023

GPT 3.5 and GPT 4 Turbo 1106 Question for Output Context Length

Related topics