Request: Query for a models max tokens

Bitcoin · April 16, 2023, 6:22am

When working with the OpenAPI models endpoint it would be quite nice to be able to directly query the models max number of tokens.

This is useful to avoid hard coding in the model(s) max token vals to compare against my own tokenized version of a users input prior to submission. This is to avoid users submitting prompts to OpenAI that exceed the model length.

Edit -
I just queried ‘gpt-3.5-turbo’ with a length of 5975 tokens. This was done to produce an error of some sort stating that I have exceeded the limit of 4096 tokens per the documentation. Instead the error stated that I was awarded an additional token since the max model token length is 4097 apparently. Thank you OpenAI

message: "This model's maximum context length is 4097 tokens. However, your messages resulted in 5975 tokens. Please reduce the length of the messages."

paul.armstrong · April 16, 2023, 8:27am

Maybe. But the token length is fixed for each model. So if you know the model you are using you know the number of tokens it has. So hard coding the model size causes no real software offense.

Bitcoin · April 16, 2023, 8:47am

There is no offence; but there are nuisances with just going with ‘hard coding’ based approach.

It would be nice to be able to just query an endpoint after selecting a model from the /models to be able to know ahead of time what the input limits to that model are so that before sending off queries using bandwidth. The application can tokenize the users input to determine the number of tokens consumed leading to an overall reduction in bandwidth.
Dynamic model selections. Lets say a user inputs text that has a token count that is larger than the default model, but you know other models support the same feature with longer token lengths. Your application could select another compatible model on the fly from the list of available models even if that model isn’t in your hard coded list.
You already can just make a bogus query to an endpoint to get the limit, just send a number of tokens that would overflow the largest model and the error message will state the number of tokens that model can input.

Edit-

Going even deeper models such as text-davinci-edit-001 do not have their token limit stated anywhere though it could be assumed to be the same as davinci-3

paul.armstrong · April 16, 2023, 8:26pm

If the tokens are fixed per modal I think all three of your scenarios are still satisfied by knowing their token length.

If you come across a modal that you are not familiar with and don’t know its length, then according to your 2nd scenario you have bigger problems like what features it has. Which, as you say, you would otherwise “know”.

We code against fixed APIs and query for variability. Consider the token length part of the fixed API which I think it is all intense and purposes it is.

Finally, I have been watching feature suggestions come and go in this forum. Very few if any are responded to by anyone with the influence to address them. And I can’t say that any have resulted in an actual change. So you may also consider the pragmatism of what I suggest.

I hope this helps

hareeswaran · February 27, 2024, 9:11am

We are trying to build a data pipeline where the OpenAi API would be used to generate embeddings for our data analysis, in our case like said in the first comment we would switch between models with respect to our incoming data and I see OpenAI updating the models and changing the token limits so, hardcoding the limit would be troublesome

The API would be highly useful

vb · February 27, 2024, 9:29am

I haven’t seen OpenAI randomly changing the context window length / number of max tokens on the API side, yet. Such a change would likely have massive impact on many deployed solutions and would be a unexpected move.
You should be able to create a config file with the model specifications and if this this highly unlikely event should occur, at least you can make your changes very easily.

Also, note the ‘edit’ in OP’s post, where they create an error message from the API which informs you about the max tokens for the particular model. You can run something like this regularly, extract the values and will be informed if you need to update your app’s config.

But, as I said, this is not a likely scenario. You can focus your efforts on other parts of your project.

thelazydogsback · March 26, 2024, 8:06pm

+1 for this feature – it’s pretty amazing that I can’t just ask for llm.max_tokens – wait, I can but it doesn’t mean what you think it means

In this case the model to use is a side-effect of user preferences and the exact data that’s being operated on, so it can’t simply be hard-coded. (And yes, I want the down-stream code to “just work” in future when GPT5 or whatever comes out and that is added to the config.)

_j · March 27, 2024, 1:11am

Here is chat model data, which was programmatically extracted from error messages, using models listed by the models endpoint, which included fine-tune model probing.

Unfortunately the error reports for 3.5 models are different than for 4.0 with the same inputs. Getting the API to report the maximum output length restriction on newer 3.5 requires an input + max_tokens API call that GPT-4-turbo models could complete instead of erroring out on.

It is just a short list of fixed data, so I just edited the three models affected for this list (instead of writing and publishing even more-informed code for you to bang on the API yourself.)

model_context = {
  "gpt-3.5-turbo": {
    "context": 16385,
    "max_out": 4096
  },
  "gpt-3.5-turbo-0125": {
    "context": 16385,
    "max_out": 4096
  },
  "gpt-3.5-turbo-0301": {
    "context": 4097,
    "max_out": 4097
  },
  "gpt-3.5-turbo-0613": {
    "context": 4097,
    "max_out": 4097
  },
  "gpt-3.5-turbo-1106": {
    "context": 16385,
    "max_out": 4096
  },
  "gpt-3.5-turbo-16k": {
    "context": 16385,
    "max_out": 16385
  },
  "gpt-3.5-turbo-16k-0613": {
    "context": 16385,
    "max_out": 16385
  },
  "gpt-4": {
    "context": 8192,
    "max_out": 8192
  },
  "gpt-4-0125-preview": {
    "context": 128000,
    "max_out": 4096
  },
  "gpt-4-0314": {
    "context": 8192,
    "max_out": 8192
  },
  "gpt-4-0613": {
    "context": 8192,
    "max_out": 8192
  },
  "gpt-4-1106-preview": {
    "context": 128000,
    "max_out": 4096
  },
  "gpt-4-1106-vision-preview": {
    "context": 128000,
    "max_out": 4096
  },
  "gpt-4-32k": {
    "context": 32768,
    "max_out": 32768
  },
  "gpt-4-32k-0314": {
    "context": 32768,
    "max_out": 32768
  },
  "gpt-4-32k-0613": {
    "context": 32768,
    "max_out": 32768
  },
  "gpt-4-turbo-preview": {
    "context": 128000,
    "max_out": 4096
  },
  "gpt-4-vision-preview": {
    "context": 128000,
    "max_out": 4096
  }
}

from which you can extract:

>>>model_context["gpt-4"]["context"]
8192

On models with the same context as max, there is no artificial restriction on output, but you can’t actually set max_tokens that high. Instead you’d do something like (max_tokens=2000) - 8192 = maximum 6192 input tokens you can send, including overhead.

sys2 · June 6, 2024, 12:06am

This has been asked for so many times, and it’s such a small change.
I get that it’s not a priority, but it sucks that we’re forced to write crap code with hardcoded values.

_j · June 6, 2024, 12:46am

Since it has been asked often, somebody wrote a Python extension to the models endpoint, the models you can access enhanced with all the metadata you might want to retrieve.

https://community.openai.com/t/api-models-endpoint-with-model-features-pricing-context-length-yes-with-this-python-code/740471/2

(that somebody was me…)

sys2 · June 7, 2024, 8:41pm

Thank you for this. It’s so sad that OpenAI doesn’t focus on making the jobs of developers easier, with issues like this and and others getting ignored.
Another one that baffles me is that the API still doesn’t return the token usage when streaming, after it’s been asked for so many times.

PaulBellow · June 7, 2024, 8:45pm

I believe this has been started recently?

vb · June 7, 2024, 9:01pm

Yup, it’s in the cookbook. This feature has been requested a lot in the past.

How to get token usage data for streamed chat completion response

sys2 · June 12, 2024, 5:39pm

Found it now as well, so glad it’s in!

Topic		Replies	Views
Add a token limit attribute on api.openai.com/v1/models API api	5	1330	June 15, 2023
API for additional model details? API api	3	2327	June 20, 2023
What is the maximum response length (output tokens) for each GPT model? API	6	47815	November 7, 2024
Maximum token length allowed API	9	42755	December 13, 2023
Can I set max_tokens for chatgpt turbo? API	23	29317	December 13, 2023

Request: Query for a models max tokens

Related topics