That’s the part that basically breaks, so in 2021 I just abandoned the idea of managing it through prompts and system instructions, and orchestrated the whole process of “is this a valid answer” in details and via old good code… Works like a charm for me: www.lawxer.ai but heavily depends on application domain.
What are you building? What are the criterias used to “know” the response is “accurate”? How humans do the process?
Thanks for the insight! I’ve started experimenting with my own workflow to evaluate responses for accuracy while trying to maintain a sharp, witty style.
My current approach involves:
Analyzing the input and context
Applying persona templates / prompt tweaks
Evaluating outputs against expected tone and factual correctness
I’m curious — do you have suggestions on additional checks or structures that help ensure accuracy without losing personality?