STEPS TO REPRODUCE:
*) send an API request to chat completions with this payload:
{“model”: “gpt-4-vision-preview”, “messages”: [ { “role”: “user”, “content”: “hello, tell me about Philipp Lengauer” }]}
*) get this response (part of the json):
{
“message”: {
“role”: “assistant”,
“content”: “Philipp Lengauer is not a widely known public figure, so there isn”
},
“finish_details”: {
“type”: “max_tokens”
},
“index”: 0
}
*) you see that the sentence is cut off after 16 tokens and the finish details say the output is stopped because max tokens is not set.
PROBLEM: the doc clearly states here https://platform.openai.com/docs/api-reference/chat/create that the max_tokens field, if not set, defualts to infinite. I get only a full response if i set it explicitely to something bigger than 16. so thats either a bug in the doc, OR a bug in the default limit.
I would prefer the fix to be in the default limits, after all the new gpt-4 turbo model with the same context length actually defaults to infinite, and it relieves the user from the API to either estimate or calculate the tokens manually to set max_tokens not to something too high and get API errors (after all the max_tokens + prompt tokens must not exceed the total context length). so with that limit in place, the calculation suddenly becomes necessary. however, that is very tedious to do, and only possible in python at the moment as far as I understand.