Any way to control assistant verbosity?

I’m having a hard time getting my RAG assistant to be concise. I’ve tried emphasizing in the assistant instructions that it should shoot for responses of 1-2 paragraphs and told it to be concise, succinct, direct, to-the-point, etc. - but it still almost always spits out walls of text that are 5-6 paragraphs long, minimum.

I’m providing it characteristics of the user (age group, interests, requested topics, etc.) and it seems like it feels the need to enumerate answers for every single provided interest or topic in a separate paragraph, but I don’t want to have to reduce the personalization info I’m providing just for it to be concise.

Has anyone found any good levers or prompting techniques to control assistant verbosity?


The system message used in the Android app and mobile web versions of ChatGPT included this text,

You are chatting with the user via the ChatGPT Android app. This means most of the time your lines should be a sentence or two, unless the user’s request requires reasoning or long-form outputs. Never use emojis, unless explicitly asked to. Never use LaTeX formatting in your responses, use only basic markdown.

That has been pretty effective at curtailing long responses.


Some thoughts:

  1. is there any possibility of confusion in your prompt? i.e: you ask for 1-2 paragraphs, but give it 4-5 characteristics - could your prompt be intepreted to mean that you want 1-2 paragraphs per characteristic?

  2. we rather have the issue that sometimes responses are unexpectedly short (p ~0.05). but those are easy to detect, and we regenerate.

Overall, it’s generally a matter of prompting, but it’s always possible to get outliers. Do you wanna share your prompt?

Asking for an “executive summary” might work too.

It would be helpful though to be able to see the an example of an actual prompt you are using to pinpoint where it is going astray.


I limit my Assistants API response by including ‘reply in no more than 40 words’ in the instruction, seems to work.


If you have control over the chunk size in your RAG implementation it’s worth trying with a smaller value or possibly going for sentence embeddings and seeing if that reduces the verbosity, I’ve noticed that with large chunk sizes as context the response if often longer.

I found it immensely successful to encourage the bot to imitate Blazon - Wikipedia


Exact system message example:

Blazon-style concise & condensed english language. You may use even expert terminology if it helps to condense. Don’t explain terminology! Be sparse. Don’t give information that needs more context to be useful! Quantify!!

Topic: Debian GNU/Linux

Originally I think it was “extremely concise” instead of just “concise” and then the answers were ridiculously short and you had to pretty much always probe many times to get what you wanted, but sure the answers were extremely sparse.


As a heraldry and vexillology nerd, I love this solution!

1 Like

With a RAG application, in addition to instructing to be concise, I also instructed that users have access to the source materials to read too, so no need to repeat them at length. This immediately turned verbose answers into more succinct answers.


This guy is very brief.

In this ‘extreme’ case, it helped to state that “it is acceptable for answers to be brief”. And using adjectives like concise and brief throughout the prompt, not just in one instruction.

1 Like

Overall, the most helpful advice in instructing via system messages is trying to avoid telling what shouldn’t be done, but instead, tell what should be done so clearly it implies the opposite well enough.

Here’s an example discussion about a problem I just had using this system message. No way everything from the beginning to the solution would have fit into a single screenshot by default.

Lots of great ideas given here. In some cases, where I’ve had to deal with stubborn bots that like to ramble, including a limitation based on token size (1 token roughly equals about 4 words) had worked for me.

V=<0-5>: control verbosity (default is X)

“Verbosity level [V]: choose a value between 0 and 5 to set verbosity level. The default setting is X. A lower value results in less detailed output while a higher value increases detail”

Giving it a length range in tokens worked for me (I.e. Make it 25 to 30 tokens long).

1 Like

@f2618752 is this something you’re proposing to add to your prompt? Or do you see this somewhere in the documentation?

Marking this the solution because I think that was the most unintuitive thing for me - that the model doesn’t seem to understand length in terms of paragraphs or sentences, but telling it to restrict itself to a certain amount of tokens works very well.

1 Like