Framework/Thought Process of Diagnosing Prompt Issues

Often I’d be able to start writing a prompt (or generate one). The result ends up being a mix of some inaccurate tool calls, a little bit too wordy there, ignoring an instruction to bold letters. Since I don’t know why that happens, I’m mostly making uncalculated guesses at which parts to change.

Has anyone come up with a structured method of diagnosing prompts into broad categories ?

I was wondering if there are tell tale signs of certain mistakes, for eg not specifying a full “tool_name” may cause a model to consistently make incorrect choices.

By knowing what’s wrong, we could narrow down some prompting techniques to see if it solves the problem. For eg a category of instruction ignoring, one could use either a system prompt for higher priority, or applying markdown to make rules distinct.

-A junior dev

1 Like

Hey @0xV4L3NT1N3, yeah, this is a very common spot to hit. What you’re describing (wrong tool calls, ignored formatting, extra verbosity) usually comes down to prompt clarity and structure.

I’d recommend starting with this official guide on how to create a good prompt. It covers being explicit, separating constraints clearly, and defining output formats, which directly helps with the issues you mentioned.

A good rule of thumb: change one variable at a time and test like you’re debugging code. That makes patterns much easier to spot.

1 Like