Limiting Answer Length with Tokens / Prompting

Without fine tuning on more concise answer shapes what are the most effective ways today to limit completions to be more concise / natural? For example when asking a broader question to GPT4, there seems to be a structured template for the response shape: brief abstract, detailed steps / options list, and then conclusion.

If I just want a more naturally written options list in a paragraph or a generally more concise response are the most effective ways to achieve that prompting for those specific types of responses or are there any guardrail parameters / phrases that tend to be more effective?

Another option I see would be using another model to summarize the output but that seems inane.

Hi - you can achieve a lot with specific prompting. I use GPT 4 for summarization a lot and I have over time shifted to a prompting strategy that involves giving the model the general instruction for preparing a summary and then appending that by a list of specific principles that it should adopt when creating this summary. In my case that’s over 10 specific principles that address things from style, structure and granularity to other very specific points unique to my context.

Getting your output in the desired shape will involve some trial and error. Review the outputs for characteristics that you like/dislike, then incorporate these findings into your prompt.

But in a nutshell, if length is your concern, then that can definitely be addressed in your prompt.

Makes sense, how reliable is that kind of prompting to corral the style / output into the format that you are looking for? Or do you just do validation after the fact to ensure that it meets your guidelines?

Have you also been able to achieve tone / style changes with pure prompting?

I have found it to be very reliable. I do check for style regularly through a manual validation process and then just make changes as necessary.

After the summary is created I have additional controls in place for validation but they are not focused on style etc. but other aspects such as cases when a summary could not be properly generated. But technically you could deploy the strategy of having the summary be reviewed for certain criteria by a second model.