I’m having a hard time getting my RAG assistant to be concise. I’ve tried emphasizing in the assistant instructions that it should shoot for responses of 1-2 paragraphs and told it to be concise, succinct, direct, to-the-point, etc. - but it still almost always spits out walls of text that are 5-6 paragraphs long, minimum.
I’m providing it characteristics of the user (age group, interests, requested topics, etc.) and it seems like it feels the need to enumerate answers for every single provided interest or topic in a separate paragraph, but I don’t want to have to reduce the personalization info I’m providing just for it to be concise.
Has anyone found any good levers or prompting techniques to control assistant verbosity?
The system message used in the Android app and mobile web versions of ChatGPT included this text,
You are chatting with the user via the ChatGPT Android app. This means most of the time your lines should be a sentence or two, unless the user’s request requires reasoning or long-form outputs. Never use emojis, unless explicitly asked to. Never use LaTeX formatting in your responses, use only basic markdown.
That has been pretty effective at curtailing long responses.
is there any possibility of confusion in your prompt? i.e: you ask for 1-2 paragraphs, but give it 4-5 characteristics - could your prompt be intepreted to mean that you want 1-2 paragraphs per characteristic?
we rather have the issue that sometimes responses are unexpectedly short (p ~0.05). but those are easy to detect, and we regenerate.
Overall, it’s generally a matter of prompting, but it’s always possible to get outliers. Do you wanna share your prompt?
If you have control over the chunk size in your RAG implementation it’s worth trying with a smaller value or possibly going for sentence embeddings and seeing if that reduces the verbosity, I’ve noticed that with large chunk sizes as context the response if often longer.
Blazon-style concise & condensed english language. You may use even expert terminology if it helps to condense. Don’t explain terminology! Be sparse. Don’t give information that needs more context to be useful! Quantify!!
Topic: Debian GNU/Linux
Originally I think it was “extremely concise” instead of just “concise” and then the answers were ridiculously short and you had to pretty much always probe many times to get what you wanted, but sure the answers were extremely sparse.
With a RAG application, in addition to instructing to be concise, I also instructed that users have access to the source materials to read too, so no need to repeat them at length. This immediately turned verbose answers into more succinct answers.
In this ‘extreme’ case, it helped to state that “it is acceptable for answers to be brief”. And using adjectives like concise and brief throughout the prompt, not just in one instruction.
Overall, the most helpful advice in instructing via system messages is trying to avoid telling what shouldn’t be done, but instead, tell what should be done so clearly it implies the opposite well enough.
Here’s an example discussion about a problem I just had using this system message. No way everything from the beginning to the solution would have fit into a single screenshot by default.
Lots of great ideas given here. In some cases, where I’ve had to deal with stubborn bots that like to ramble, including a limitation based on token size (1 token roughly equals about 4 words) had worked for me.
“Verbosity level [V]: choose a value between 0 and 5 to set verbosity level. The default setting is X. A lower value results in less detailed output while a higher value increases detail”
Marking this the solution because I think that was the most unintuitive thing for me - that the model doesn’t seem to understand length in terms of paragraphs or sentences, but telling it to restrict itself to a certain amount of tokens works very well.