Why does chat completion API stop sometimes during generation? Max tokens not an issue

Hi all,
I have an app that uses the OpenAI Node SDK with the openAI API and the GPT-4 model to generate short stories. The cumulative total tokens from input to output averages about 1800-2000 tokens. The problem is that the chat completions API tends to terminate/cutoff towards the end of the story during text stream. As the maximum output token is 4000+. I am unsure why the chat completion stops during generation if the output is about 1800 tokens.

Anyone got any advice or guidance on how to debug this? Is it possible that perhaps there is a keyword or phrase at the end of the story (ie, “THE END”) that may be causing the API to stop (ie, STOP SEQUENEC parameter). ?

GPT-4 got a limit of 4000 output tokens. Try the gpt-4 16k or 32k, if available for you.

Yes but like I mentioned, the total output I am generating is no more than 1800-2000 tokens (this includes system message + input + generated output from LLM). It’s a one-time generation, not an on-going chat.