Is there also a situation where GPT stops generating so we need to make it continue manully

There’s a few reasons why you could be getting truncated output:

  1. You are setting a max_token value with your API call, and the generated output has exceeded the limit;
  2. You are sending a very large input, so an adaptive or unset max_token value doesn’t leave enough context length after the input for creating the desired response;
  3. Your streaming generation is taking too long, and either your platform times out after a short period (like 60 seconds of open connection) or you successfully made the AI produce very long output against its own 5-minute server timeout (about 10,000 tokens of -16k generations starts to get you into the five minute range at “normal speed”).
  4. The AI was done writing, and generated a stop token (or rather a stop token was selected from the sampling of likely output tokens).

Most above have obvious solution if you do the logging and troubleshooting. #4 would require lowering the temperature or top_p so there is less selection of unlikely tokens, and the AI output seen follows its production intent.

1 Like