Add a token limit attribute on api.openai.com/v1/models

I have a VSCode extension where all the valid ChatCompletion models returned by api.openai.com/v1/models can be selected. I need to be aware of the Max Token Limit of the model before I send the request because the extension recognizes going over the limit and then suggests measures to reduce the number of tokens.
Because api.openai.com/v1/models doesn’t give this information, I have to hardcode this in the VSCode extension code and, as a result, update the extension when new releases are made (just like today).

I’m pretty sure everybody who provides an interface with model selection is impacted by the lack of such attributes on the /models endpoint.

Could this please be considered as future addition? Is anybody else here who would support this change?

3 Likes

Might be relevant here, one of the things I tend to do on larger projects it so have a periodic phone home system, typically tied to end of day housekeeping functions or “about-~>check for updates” triggered that gets the app to call a main server endpoint and get back an object that contains all of that applications potentially variable settings, things like buffer sizes, API endpoint specifics, model names, array maximums, etc, etc. Then I can just make a change once in my main server and all the clients update.

I see your point though, perhaps a “capabilities” endpoint return structure that contains details like that, such a feature exists but it only has basic information… at least to the best of my knowledge, it does)

Then I can just make a change once in my main server and all the clients update.

It’s just a VSCode extension that only needs the OpenAI APIs, so I don’t want anything being dependent on one of my backends or calling anything at all, especially considering users are becoming warier of the surge of fraudulent extensions that steal local data.

I see your point though, perhaps a “capabilities” endpoint return structure that contains details like that, such a feature exists but it only has basic information… at least to the best of my knowledge, it does)

We don’t need to make this more complex, there’s already a /models endpoint. The only thing OpenAI needs to do is add one more attribute for the number of max tokens to the models being returned. This is an easy, legitimate non-breaking change.

1 Like

This is something that is also an unanswered request for features in tiktoken, since it seems openAI doesn’t yet see the value of returning context length per model. Best one can do is dump all the models out and after research, create your own dictionary, also noting the endpoints and formatting methods supported if you want a “pick any model” type application. The best one might do is add some heuristics to figure out what “gpt-3.6-boost-24k” in the model list would mean in the future.

Another thing one might do, to make such a dictionary or to add knowledge of unknown models, is simply make a request. It doesn’t need to be costly: just send “hi” but attempt to reserve max_tokens of 4100 or 4090 around typical thresholds and see which doesn’t produce an error.

Maybe move it to a setting for now so at least users can modify it?

Thanks @_j , just testing it out might be an idea.

I’d also rely on deriving this from the model’s name, but I don’t think OpenAI can commit to a long-term naming convention yet.

Well, at some point, they will have to do it - I can’t see bigger enterprise customers entertaining workarounds for something so trivial to implement.