A way to reduce response length without a finish_reason of STOP

I am developing an application for a very small display (size of wrist watch), and I want the responses to be terse. I thought max_tokens would inform the model to formulate a response given so many tokens, but it just truncates the response.

I am exploring some prompt engineering, like, “Please respond briefly”, but I’d much rather have a way to limit the response without getting a truncated finish_reason of STOP.

Maybe what I’ll do is if the response exceeds the limit I want, I send back the response with the prompt, “say that more tersely”… and try that a couple times before delivering a truncated response, which is a an undesired user experience.

did you find a solution for this? Im facing the same problem.

1 Like

That’s a great idea. From what the GTP4 demo looked like, sounds like the GTP4 model system message will make this is moot issue. At least I hope so :slight_smile:

1 Like