Token Limitization Error when prompting

Hello everyone,
I’m working with the GPT 3.5 API, and as you all know it has 4096 as token limit.
The task that the LLM is performing is code migration from one language to another. So when tried some chunking techniques the context of the written code is lost. (Tried the FAISS model from Meta).
Any suggestions (already tried the 16k and 32k version) ?

1 Like

gpt-3.5-turbo-16k: 16384 tokens.

1 Like


Not sure if it’s applicable in your scenario, but have you considered using gpt-3.5-turbo-1106?
It offers a context window of around 16K tokens. While the output is still limited to 4K tokens, this larger context could allow for more effective slicing and handling of larger codebases, potentially addressing your issue.

Would this help?

1 Like

Thanks for answering guys, i’ve already tried the 16k and 32k model


I am also getting these errors, it was working perfectly a few days ago:

message: "This model’s maximum context length is 16385 tokens. However, you requested 17001 tokens (617 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

message: “This model’s maximum context length is 8192 tokens. However, you requested 8808 tokens (617 in the messages, 8191 in the completion). Please reduce the length of the messages or completion.”,

I have tried the different models and nothing is working. Both of the above is for the same prompt, using gpt-4 and the 16k 3.5.

1 Like

No problem and thanks for the update.
Have you considered using the gpt-4-1106 model, or do you prefer to stick with the 3.5 version?
If you’re inclined towards using 3.5, have you tried refactoring the code (sequentially via 3.5), or, if possible, make it more modular? If you reduce the length this might help manage the process within the token limitations.

1 Like

I was using gpt-4-1106 and it was working perfectly, then i started to get errors. It wont work with GPT-4, gpt-4-1106 , or gpt-3.5-turbo-16k. It just keeps trying to create something greater than the max tokens. Again, this is new, the code worked perfectly a few days ago.

1 Like


It seems your issue might be more related to the completion side rather than the input. Have you checked if the model is not stopping as expected?

1 Like

Completions have a limit that’s not the same as the context window. In general, gpt-3.5, including 16k models, will only output a single response a maximum of 2048 tokens, or roughly 1500 words. At least this is what we’ve seen in fiction prompting, again and again.

GPT 4 can output 4096.

The way people get longer responses is to chain the prompts inside of a single Playground “chat” or with API calls in applications. But, it’s tricky to slide the context window down the length of large pieces of information.

On top of that, in my experience, these models were so hard trained to be good summarizers that routinely any kind of translation or rewrite you ask it to do loses content. You can have a 300 word section of writing, ask it to write it longer and it writes back 287 words.

For the errors you’re getting, the maximum content length is 16385 means you have to leave room for the response and cannot prompt it right up to the limit. You are prompting 617 tokens, so you need to slide back the requested completion length, which truly doesn’t matter anyway because they can only write back 2048 or 4096 in a single response anyway.