Is Max output tokens of each model a stable parameter that wont change for the same model even with updated versions?

Hi there!

I’m implementing a service that allows users to send requests to OpenAI models. My prompts consist of both a fixed part and a variable part based on the view the user is working in. These views can range from a few hundred tokens to several thousand.

To ensure there’s enough space for the model to generate responses, I take into account for the context window of each model, leaving an overhead for the model’s answer and cutting the variable part short if needed.
However, with models that have large context windows, the overhead I leave sometimes exceeds the model’s maximum output token limit.

My question is: are these maximum output token limits stable parameters that won’t change when the models are updated? Can I reliably store a static map of each model and its corresponding max output tokens, so I don’t leave too much overhead for the model’s response?

Thanks in advance to everyone, I’m new to the forum, hoping to learn a lot!

1 Like

Hi @rrivero and welcome to the community!

Short answer: depends on how you use it.

If you are pointing to particular model checkpoints, like gpt-4o-2024-08-06 then it’s stable/unchanged. If you are pointing to the “main” model like gpt-4o, then it is unstable. The reason being is that OpenAI, once it promotes a certain checkpoint to “main”, it will de-reference the main model to that checkpoint.

Example: from 2nd October, gpt-4o will point to gpt-4o-2024-08-06, which means its max output will go from 4096 to 16,384 tokens.

My advice is to use “dated” models/checkpoints and you will be fine.

1 Like

Hi @platypus !

I’ll try what you say but I feel like my clients may not use as much exact dated models. I don’t really know the amount of knowledge they might have on this subject.

Anyways thanks a lot for your quick and complete answer!

1 Like