Anyone can explain me why I cannot set max_token to 32k?

jbourque · September 22, 2024, 8:12pm

Hi all,

this payload
{ “model”: “gpt-4”,
“messages”: [

{"role": "user", "content": "Générer les titres de  chapires que pourrait inclure un livre sur chatgpt."}

],
“max_tokens”: 20000,
“temperature”: 0.3,
“n”: 1,
“presence_penalty”: 2.0,
“frequency_penalty”: 0.1
}

returns this:
{
“error”: {
“message”: “This model’s maximum context length is 8192 tokens. However, you requested 20029 tokens (29 in the messages, 20000 in the completion). Please reduce the length of the messages or completion.”,
“type”: “invalid_request_error”,
“param”: “messages”,
“code”: “context_length_exceeded”
}
}

I’m tiers 5
How can I request a 32k max_token?

It seems to be possible.

Thanks to all for your support

dignity_for_all · September 22, 2024, 8:34pm

The context length of GPT-4 is 8,192 tokens.

This is the total of both input and output (completion), so if the completion tokens exceed 8,192 tokens, an error will occur.

After the release of GPT-4, a 32K token length version of GPT-4 was also released, but it did not become mainstream, and most people cannot access it except for a very few who currently have access.

Additionally, the 32K model of GPT-4 will be discontinued for existing users in June next year and will no longer be available.

Instead, it is recommended to use models such as gpt-4o with a 128K context length.

_j · September 22, 2024, 8:37pm

max_tokens is a parameter for the maximum response length you will receive back. Perhaps you misinterpret the intention of this parameter.

Is it because you are using the model GPT-4, which has a context length of 8192 tokens? Thus making an attempt to reserving a 20000 token portion of that context solely for a response, which would be impossible? Yes.

The setting has particular value. Setting it can cut off the response of an AI if the AI model goes into a loop of nonsense, producing garbage far beyond the length your application needs.

No OpenAI model is completely copacetic with this idea of long output, though, even on the ones that support it, like the limited access “gpt-4-32k”, or “gpt-4o-2024-08-06” that allows a setting of 16k for output. The model will find a reason to wrap up the response before it approaches anything like that kind of response length.

3000 is reasonable to get anything the AI could write without special prompt engineering to coax it to go beyond its training.

jbourque · September 22, 2024, 8:52pm

Hey, thank you for that great answer.
I create a book generator. When I ask to provide detailed information with short conclusion, the last paragraph usualy has a lack of completed sentence. missing words like these one “the” “of” “and” etc.

If I set to 8000k it’s worse.
If i set to 14k it’s still present, but little worse.

Can I ask you if you have a suggestion for this issue?

I can provide examples, how ever, it’s in french.

thanks

_j · September 22, 2024, 9:03pm

The AI language model never actually receives this parameter to base its response on. It only serves to terminate the output.

Therefore, you can simply set it to a point where no responses of yours are actually cut off.

If you aren’t looking at the “usage” statistics the API returns to show token consumption, you can also paste text to this site to see how much the AI is writing and how much you still have left to go were you to use a particular setting: https://platform.openai.com/tokenizer

You may have more consistent language use if you were to send lowered API parameters of sampling:

{
"model": "gpt-4-turbo",
"top_p": 0.5,  # default and max is 1.0
"temperature": 0.5,  # default is 1.0
"messages" : {...

Then, after that refinement is in place, the models selected themselves will have different qualities in how prone they are to errors in different world languages, and how much those sampling settings just shown need to be constrained even lower.

You can omit the max_tokens parameter entirely; you just won’t have the safety of a cutoff if the output goes into a repeating loop (which is a less common symptom these days).

dignity_for_all · September 22, 2024, 10:12pm

You mentioned that you are creating a book generator, but even with the traditional GPT-4, the quality of the sampled text tends to decrease as the output gets longer, possibly due to the limitations of the attention mechanism.

However, recently OpenAI announced a model that supports 64K output (though it is not yet available for use), so it is possible that this model has improvements.

https://openai.com/gpt-4o-long-output/

jbourque · September 22, 2024, 11:03pm

Hi @dignity_for_all,

I read the link about the alpha program. Do you know how to request an interest to OpenAI?

thanks a lot!

dignity_for_all · September 23, 2024, 6:17am

It’s unfortunate that, no one knows who can use this kind of Alpha program, whether it will become a product in the future, and so on.

Even if you want to find a way to show interest to OpenAI, it can be difficult to contact them directly.

It is likely that only a select few, such as large corporations or researchers at academic institutions, have access.
If the 64K output model becomes a product in the future, it might become available as a beta version.

However, even if you can’t use the 64K output model, you can achieve your goal with creative approaches, such as generating text in segments and combining them into one book.

In fact, there are people who have published books using AI output.

_j · September 23, 2024, 6:32am

How about an AI model that at least has a published path to gaining access (having paid a bunch of money to OpenAI and grown to Tier-5)?

o1-mini has an advertised (but not delivered in any attempt I’ve made) 64k output:

MODEL	DESCRIPTION	CONTEXT WINDOW	MAX OUTPUT TOKENS	TRAINING DATA
o1-preview	Points to the most recent snapshot of the o1 model: o1-preview-2024-09-12	128,000	32,768	Up to Oct 2023
o1-mini	Points to the most recent o1-mini snapshot: o1-mini-2024-09-12	128,000	65,536	Up to Oct 2023

The figure instead aligns more closely with “billable as output tokens you don’t receive” per internal iteration, because this AI will stll find a way to stop producing. If repeating back something exactly from a source, you might be able to go over the less than 12k that I have gotten by prompting for justified non-refused non-stop creative generation.

Topic		Replies	Views
GPT-4o-mini max token 16,384 API gpt-4 , api	2	1799	August 31, 2024
GPT-4 128K only has 4096 completion tokens API gpt-4	9	27089	February 27, 2024
Test new 128k window on gpt-4-1106-preview API	29	18334	February 6, 2024
ChatGPT answers partially to request API chatgpt	6	143	February 20, 2025
O1-preview failing in Postman API api	4	210	September 18, 2024

Anyone can explain me why I cannot set max_token to 32k?

Related topics