Is there also a situation where GPT stops generating so we need to make it continue manully

Sometimes when I am generating a content GPT automatically stops. If user wants to continue, continue generating button has to be pressed. I would like to know if it happens during API usage as well or only because of user interface limitations on the web browser?

Generation will also stop with the API if the request produces more tokens than there is room for in the reply. you can always prompt the model again to continue from where it left off if this happens and you will get a finish_reason of “length” so you can detect when this happens.



thanks for the quick response. What is the limit of words that can be returned in the response without need to choose “continue generating”?

Kind regards,

That is very much dependant on your prompts, gpt-3.5 is tuned for shorter answers 3.5-16k will typically produce longer replies to the same question, you can use words like “verbose” and “expansive” to request longer replies. AI’s do not work like traditional software, there are very few hard edges to anything.


I have one more question. I am actively using GPT API as part of my android app for certain processes which require longer anwers (around 1600 words). In 1 out of 10 requests I do not get complete answer. How should a logic within the code look like to handle this effectively? For example should I send “please continue” request again if certain condition is fulfilled? I would be grateful if you had a suggestion.

Kind regards,

There’s a few reasons why you could be getting truncated output:

  1. You are setting a max_token value with your API call, and the generated output has exceeded the limit;
  2. You are sending a very large input, so an adaptive or unset max_token value doesn’t leave enough context length after the input for creating the desired response;
  3. Your streaming generation is taking too long, and either your platform times out after a short period (like 60 seconds of open connection) or you successfully made the AI produce very long output against its own 5-minute server timeout (about 10,000 tokens of -16k generations starts to get you into the five minute range at “normal speed”).
  4. The AI was done writing, and generated a stop token (or rather a stop token was selected from the sampling of likely output tokens).

Most above have obvious solution if you do the logging and troubleshooting. #4 would require lowering the temperature or top_p so there is less selection of unlikely tokens, and the AI output seen follows its production intent.

1 Like