Why does chat completion API stop sometimes during generation? Max tokens not an issue

doss · February 2, 2024, 3:17pm

Hi all,
I have an app that uses the OpenAI Node SDK with the openAI API and the GPT-4 model to generate short stories. The cumulative total tokens from input to output averages about 1800-2000 tokens. The problem is that the chat completions API tends to terminate/cutoff towards the end of the story during text stream. As the maximum output token is 4000+. I am unsure why the chat completion stops during generation if the output is about 1800 tokens.

Anyone got any advice or guidance on how to debug this? Is it possible that perhaps there is a keyword or phrase at the end of the story (ie, “THE END”) that may be causing the API to stop (ie, STOP SEQUENEC parameter). ?

jamilbio20 · February 2, 2024, 4:53pm

GPT-4 got a limit of 4000 output tokens. Try the gpt-4 16k or 32k, if available for you.

doss · February 2, 2024, 5:04pm

Yes but like I mentioned, the total output I am generating is no more than 1800-2000 tokens (this includes system message + input + generated output from LLM). It’s a one-time generation, not an on-going chat.

eesha · January 20, 2025, 6:44pm

Did you find a fix for this? I am still facing the same issue…

Topic		Replies	Views
GPT-4o-mini max token 16,384 API gpt-4 , api	2	1929	August 31, 2024
Is it me or GPT4 consistently doesn't finish and cuts the answers? API	18	6648	April 11, 2024
Chat Completions output cutting off without hitting max_tokens limit API gpt-35-turbo , api , token , gpt-0125	1	925	July 14, 2024
GPT-4 does not utilise all output tokens (max_tokens=4095) Prompting gpt-4 , api	3	893	June 13, 2024
OpenAI truncating the response API gpt-4 , chatgpt	0	140	April 4, 2025

Why does chat completion API stop sometimes during generation? Max tokens not an issue

Related topics