I’m testing out gpt-3.5-turbo-16k on a dev server and I’m giving it the max tokens, however, it seems that it can’t output more than maybe 2,000 tokens? I’ve tested raising the presence penalty, I’ve lowered and raised the temperature, I’ve lowered the top p, I’ve changed wording on prompts, but it seems to always top out around 1,000 - 2,000 tokens in response.
Also, gpt-4-0613 seems more like gpt-3.5-turbo-pro than it does gpt-4.
Is anybody else having problems with the 16k not outputting more then 1K-2K tokens?
Also, can we get some playground settings that actually match what’s available… penalties go negative, but not on playground. Also, max_tokens goes way higher than that on all models now. It would make testing parameters a lot easier.
I have the 16K available on playground and working great but not on the API, I get a response response:{ “error”: { “message”: “The model gpt-3.5-turbo-16K does not exist”,