When working with the OpenAPI models endpoint it would be quite nice to be able to directly query the models max number of tokens.
This is useful to avoid hard coding in the model(s) max token vals to compare against my own tokenized version of a users input prior to submission. This is to avoid users submitting prompts to OpenAI that exceed the model length.
I just queried ‘gpt-3.5-turbo’ with a length of 5975 tokens. This was done to produce an error of some sort stating that I have exceeded the limit of 4096 tokens per the documentation. Instead the error stated that I was awarded an additional token since the max model token length is 4097 apparently. Thank you OpenAI
message: "This model's maximum context length is 4097 tokens. However, your messages resulted in 5975 tokens. Please reduce the length of the messages."
Maybe. But the token length is fixed for each model. So if you know the model you are using you know the number of tokens it has. So hard coding the model size causes no real software offense.
There is no offence; but there are nuisances with just going with ‘hard coding’ based approach.
It would be nice to be able to just query an endpoint after selecting a model from the /models to be able to know ahead of time what the input limits to that model are so that before sending off queries using bandwidth. The application can tokenize the users input to determine the number of tokens consumed leading to an overall reduction in bandwidth.
Dynamic model selections. Lets say a user inputs text that has a token count that is larger than the default model, but you know other models support the same feature with longer token lengths. Your application could select another compatible model on the fly from the list of available models even if that model isn’t in your hard coded list.
You already can just make a bogus query to an endpoint to get the limit, just send a number of tokens that would overflow the largest model and the error message will state the number of tokens that model can input.
Going even deeper models such as text-davinci-edit-001 do not have their token limit stated anywhere though it could be assumed to be the same as davinci-3
If the tokens are fixed per modal I think all three of your scenarios are still satisfied by knowing their token length.
If you come across a modal that you are not familiar with and don’t know its length, then according to your 2nd scenario you have bigger problems like what features it has. Which, as you say, you would otherwise “know”.
We code against fixed APIs and query for variability. Consider the token length part of the fixed API which I think it is all intense and purposes it is.
Finally, I have been watching feature suggestions come and go in this forum. Very few if any are responded to by anyone with the influence to address them. And I can’t say that any have resulted in an actual change. So you may also consider the pragmatism of what I suggest.
I hope this helps