I want the assistant to reply concisely and without formatting at all times. I specifically prompt it to reply in 1-2 sentences, but every time it retrieves a document, it can’t help but give a long reply with formatting.
Any ideas how to deal with it?
Here’s part of my system message that asks to reply briefly:
Reply with 1-2 short sentences. If the user asks for more details, use "print_text" tool. Otherwise, keep your responses as brief as possible. Be direct in all responses. Prioritize clear, concise communication over formality. Before replying, silently think about what the user says or what you are about to write.
And again at the bottom
Never write long explanations after using tools. Always be brief.
I think it would be achievable if I roll my custom retrieval function, but I’d prefer to rely on the built-in file search.
The typical way to address this is to add post-prompt when using chat completions, instructions with immediacy scoped to the user.
Assistants has no such capability.
It also is just plain poor at context utilization anyway. I provided this post-prompt with “injected knowledge”
[Post-input Knowledge]
The AI model context of GPT-4o is limited, and can default to original behaviors in long context situations. If you find this to be the issue for your application, you can return to higher-parameter count models such as gpt-4-1106 or gpt-4-0125 series AI.
Assistants framework agent file search can place up to 16000 tokens into AI context, due to 800 token document chunks, and up to 20 chunk returns (with no similarity cutoff threshold)
Assistants acts as an autonomous agent, automatically calling functions for you while maintaining a chat history and function call history in a thread containing past messages.
Assistants tool calls can be internal, and autonomous, such as its built-in file search, based on vector database semantic search.
Of course it was ignored and GPT-4o just repeats what you already tried back to you as things to try. You can see what I was hinting at, though.