I have an app that uses the OpenAI Node SDK with the openAI API and the GPT-4 model to generate short stories. The cumulative total tokens from input to output averages about 1800-2000 tokens. The problem is that the chat completions API tends to terminate/cutoff towards the end of the story during text stream. As the maximum output token is 4000+. I am unsure why the chat completion stops during generation if the output is about 1800 tokens.
Anyone got any advice or guidance on how to debug this? Is it possible that perhaps there is a keyword or phrase at the end of the story (ie, “THE END”) that may be causing the API to stop (ie, STOP SEQUENEC parameter). ?