Chat Completions output cutting off without hitting max_tokens limit

Mscizor · July 14, 2024, 12:07am

I’m having trouble generating long responses. I have an 2286 long prompt_tokens input, but after telling the model to generate 10 examples of something I want it to generate, it cuts off well before hitting the 4096 output limit (or total), sometimes not even hitting the 3000 total_tokens mark. For example, after generating examples 1-5 correctly, it might just stop in the middle of example 6, with a finish_reason of “stop”, instead of “length”.

Due to having a fairly long input, I want to limit the amount of calls I make, so I’d ideally not want to only generate 5 at a time, even if that is possible.

Is this intended behavior? Does anyone have any suggestions?

jr.2509 · July 14, 2024, 9:02am

Welcome to the Forum!

It is not unusual for the model to return significantly less tokens than the defined output token limit of 4,096. Typically it’s difficult to get the model to return consistently more than 3,000 tokens. This is why it is often recommended to break up tasks into smaller pieces.

Overall, when it comes to the length of your output, much depends on how you phrase your prompt. If you have not already done so you could provide an example output as part of your prompt or be more explicit about the output. For example a phrase like the following can help to force the model into adhering to the number of examples:

Your output should be returned in the form of a numbered list as follows:

1. Example 1: <Description of example>
2. Example 2: <Description of example>
...
10. Example 10: <Description of example>

That said, if what you are asking the model is of high complexity, then even that might fail.

Topic		Replies	Views
Issues with Truncated Responses API	3	2762	April 22, 2024
Setting max tokens for output issues API gpt-4 , api	4	4086	January 26, 2024
GPT-4o-mini max token 16,384 API gpt-4 , api	2	2090	August 31, 2024
Why do I get incomplete response and output Prompting	8	6547	December 19, 2023
GPT-4 does not utilise all output tokens (max_tokens=4095) Prompting gpt-4 , api	3	929	June 13, 2024

Chat Completions output cutting off without hitting max_tokens limit

Related topics