Gpt-3.5-turbo-16k, but not for responses?

I’m testing out gpt-3.5-turbo-16k on a dev server and I’m giving it the max tokens, however, it seems that it can’t output more than maybe 2,000 tokens? I’ve tested raising the presence penalty, I’ve lowered and raised the temperature, I’ve lowered the top p, I’ve changed wording on prompts, but it seems to always top out around 1,000 - 2,000 tokens in response.

Also, gpt-4-0613 seems more like gpt-3.5-turbo-pro than it does gpt-4.

Is anybody else having problems with the 16k not outputting more then 1K-2K tokens?

Also, can we get some playground settings that actually match what’s available… penalties go negative, but not on playground. Also, max_tokens goes way higher than that on all models now. It would make testing parameters a lot easier.


Yeah,I have the same problem.Even though I set maxresponse to 8k token,But api response token less than 3k.

Is the response cut off or just not as long/detailed as you want?

Please note that the date for this transition on 27 June.

You may find the link as below, it’s for your information -


I have a similar problem. I enter a long context to answer the question but it’s not all used and I get a shorter answer. (Compared to gpt-3.5-turbo)

Hi, maybe someone can help :slight_smile:

I have the 16K available on playground and working great but not on the API, I get a response response:{ “error”: { “message”: “The model gpt-3.5-turbo-16K does not exist”,

Are you guys able to access it via API?


It’s lowercase k

1 Like

Great thanks that worked :cowboy_hat_face::+1: