Prompt Anti-Patterns — When More Instructions May Harm Model Performance

Observation

It appears that adding multiple instructions to a system prompt can sometimes degrade model performance, even when the added constraints seem reasonable.

:test_tube: Hypothetical Example

For a biology-focused GPT, potential constraints like:

“be concise”

“avoid speculation”

using a strong persona modifier

…could theoretically cause the model to omit critical conditions or oversimplify complex processes, such as cellular energy pathways or regulatory mechanisms.

:magnifying_glass_tilted_left: Hypothesis

Certain instructions may shift what the model “pays attention to,” suppressing deeper reasoning patterns in favor of surface-level compliance. This could explain why even a single word or short instruction might have an outsized negative effect.

:repeat_button: Suggested Iterative Approach

A potential workflow for addressing such anti-patterns could be:

Identify unexpected output behaviors

Infer which instruction might cause them

Adjust or relax the instruction

Retest prompts across related tasks or domains

:red_question_mark: Questions for the Community

Has anyone observed prompt elements that globally degrade performance?

Are there established categories of “instructional anti-patterns”?

Could this be an attention-allocation issue or an internal policy conflict?

How can subtle regressions in model outputs be detected early?

How about you explain what you mean by those phrases? Give it a paragraph and an example for each of that and it will follow it.

“avoid speculations”? You mean for real? You think that works when you tell it to a bunch of humans? I don’t think so.

LLM need clear - and by that I mean VERY specific explanations. Because what is a speculation? When does anything in the world become a “fact”?

Hi

Thanks a lot for your insight! I understand now that vague phrases like “avoid speculation” don’t work well for LLMs without very concrete instructions and examples.

Here’s how I interpreted it and plan to apply it:

Phrase: “avoid speculation”

Explanation: Do not generate answers based on assumptions or incomplete information. Only provide statements that can be verified with a reliable source.

Example: If asked about an unknown protein’s function, the model should answer: “I could not find verified information on this protein’s function” instead of guessing.

I’d like to ask a few questions to make sure I apply this correctly:

Are there other phrases like “avoid speculation” that commonly fail without concrete examples?

How detailed should the examples be to effectively guide the model?

Would it help to add a small internal “checklist” for the model to verify its statements before answering?

How do you usually balance between giving enough guidance and not overloading the model with instructions?

Thanks again for sharing your expertise — it really helps me understand how to design prompts more accurately!

by whom? Do you mean like verified by me? Or did you know that at least 60% of all research papers in math have wrong statements in them?

What if one person verifies “this is red” and another verifies “this is blue”…

What you are trying to do here is giving the responsibility to a LLM model. Do you know how that sounds? That specific use case is never going to happen!

Hi

Thanks for your clarification — I see now that relying on an LLM to “verify” facts independently is not realistic. I understand that verification must come from external reliable sources or human oversight, and that even published research can contain errors, so the model cannot be treated as the ultimate authority.

Here’s how I plan to apply this insight:

Use the model to structure, summarize, or highlight potential inconsistencies rather than claim facts as verified.

Clearly mark answers as “requires verification” or “based on available sources”.

Combine LLM output with external checks or curated references when accuracy is critical.

I have a few questions to make sure I apply this correctly:

In practice, when you say “verified,” what’s the best workflow to combine LLM output with human or source verification?

Are there common strategies to handle conflicting information found in different sources when using the model?

How detailed should prompts be to ensure the model flags uncertainty properly without trying to assert correctness on its own?

Is there a recommended way to log confidence and uncertainty for later review, similar to the pre-output gate concept?

Thanks again for your guidance — it really helps me understand how to responsibly test and design prompts for accurate outputs.

yes!

Even if the model gives you a source: this source can change! A link to a source is not a prove.

I sold my first book about that on Amazon this month :star_struck: wink wink

Well, I would use the LLM to give you suggestions on how to solve it and then “verify” the solution(s) by testing it.
LLM are not deterministic. They are like humans not like a computer program. What you are looking for is a computer program. One that has the definitions on what is verified and whatnot written in the rules / the code.

You could try something like ontologies to at least give the model smaller chunks to verify and you can use another model to check the first models output. That is how companies do it.
Let one human do the research and anotherone has to check it.
But does that work well? Yes to a certain point. But it needs someone with experience in verification. Which means the model itself has to be taught how to verify. I don’t think the context window is even closly big enough or ever will be big enough to do that. Even if the whole world - every single human comes by and checks a “fact” - tomorrow the world could be round.

Hi

Thanks for clarifying! I understand now that LLMs are helpers, not arbiters of truth, and that even sources provided by the model can change or be unreliable.

Here’s how I plan to apply your advice:

Clearly mark answers as “requires verification” or “based on available sources”.

Use LLM output to suggest solutions or summarize conflicting information, but always verify externally.

Treat LLMs as non-deterministic assistants, not as deterministic programs.

I have a few questions to make sure I implement this correctly:

When combining multiple sources with conflicting information, what is the best method to prioritize or weigh them?

Are there effective strategies to log or track confidence levels in LLM outputs for later verification?

How do you recommend designing a pre-defined set of rules/code to handle what counts as verified vs unverified information?

Are there any common pitfalls you’ve seen when using LLM suggestions and then verifying them manually?

Thanks again for your guidance — it really helps me understand how to responsibly test and design prompts for accurate outputs.:grinning_face_with_smiling_eyes:

1 Like

See, you are already using the strategy “define more questions” to ask.. that could be used. Write a bot that posts into a forum and let a user answer it.

Hi

Thanks for the suggestion! I see now that my approach of defining more questions is already a strategy I can leverage to test and guide the model.

I like the idea of creating a bot that posts into a forum and lets a user answer. This seems like a practical way to:

Test model behavior in real dialogues

See how it handles questions, suggestions, and conflicting information

Analyze accuracy, uncertainty, and response patterns without giving the model full responsibility for verification

I have a few questions to make sure I apply this correctly:

Should the bot log each interaction and the user’s response for later analysis?

Are there recommended ways to structure prompts so the model clearly separates answerable, uncertain, and unanswerable states in this setup?

How do you usually measure performance or detect failure patterns in this kind of interactive test?

Are there any pitfalls to avoid when designing a forum-based test bot with an LLM?

Thanks again — your guidance really helps me understand how to move from theory to practical testing while keeping accuracy and safety in mind.

Your approach makes people angry to be honest. Try not to use ChatGPT to write this stuff. Because it fills the text with fillers and doesn’t leave any room for interpretation on what is really needed and what not. You are looking for a solution. Stop using ChatGPT to talk to people. Or at least use another prompt. Using the same over and over looks like spam.

1 Like

Fair point — thanks for the direct feedback.

I’m using ChatGPT mainly to structure my thoughts, not to outsource thinking, but I see how the output can come across as over-polished and repetitive. That’s on me.

I’ll tighten things up and be more selective with what I post — less filler, more concrete questions and observations.

Appreciate you calling it out.

When you notice LLM failure patterns, do you usually prioritize adjusting prompts first, or combining prompts with external verification?

It is called “grounding” and you can ask ChatGPT to start a learning session as a roleplay where it can teach you. May I ask why you are building this?

I’m building this to gain experience from engineers and learn how to properly shape my path at OpenAI

I’m still learning technical English, so I use ChatGPT to help me write clearly. I try to focus on asking specific questions and understanding your guidance, not just copying text.

This topic was automatically closed after 42 minutes. New replies are no longer allowed.