You are right that the training examples for a GPT-3.5-turbo-0125 should allow for up to 16k tokens. What led you to conclude that it is limited to 4k context tokens?
When I try the base gpt-3.5-turbo-0125 model and my finetuned version in Playground, the maximum length is 4095, while the gpt-3.5-turbo-16k let’s you work with the promised 16k.
I haven’t been able to do a proper test with the Completions API, because the outputs are incredibly buggy, like some answers are in korean (my dataset and prompt was in English), some are incomplete tokens superposed, some are a few words with some symbol gibberish in between, etc.
The maximum OUTPUT of the newer models is 4k. That is the max_tokens value, and the purpose of the slider: response tokens. The AI will actually be restrained further in producing output by its training unless you are fine-tuning the AI specifically to write very long form. And you cannot fine-tune the gpt-3.5-turbo-16k-0613 model that doesn’t have a hard restriction on production.
That’s strange, the description of the slider says:
and now that I’m checking again, the finetuned version is limited to 2k and not 4k as the base gpt-3.5-turbo-0125. I guess that’s a bug though, same as with the weird API completions.
That aside, the dataset used to FT was long form, so it should be giving long answers too.
Good observation! The same applies to me when I checked my fine-tuned models on the playground - as I normally don’t consume them in the playground, I never realized it until now.
Have you tried using them via a regular API call outside of the playground and just set the max token hyperparameter to 4k?