Often I’d be able to start writing a prompt (or generate one). The result ends up being a mix of some inaccurate tool calls, a little bit too wordy there, ignoring an instruction to bold letters. Since I don’t know why that happens, I’m mostly making uncalculated guesses at which parts to change.
Has anyone come up with a structured method of diagnosing prompts into broad categories ?
I was wondering if there are tell tale signs of certain mistakes, for eg not specifying a full “tool_name” may cause a model to consistently make incorrect choices.
By knowing what’s wrong, we could narrow down some prompting techniques to see if it solves the problem. For eg a category of instruction ignoring, one could use either a system prompt for higher priority, or applying markdown to make rules distinct.
-A junior dev