Model endless repeating the same statement

I am using OpenRouter for fast inference on gpt-oss-120b, and I after a not so thorough investigations of the requests on wandb.ai, i’ve found that the model sometimes gets stuck, costing me lots of tokens.

Is there a way to minimize the impact of this endless repetition? I am thinking of limiting the tokens, as I get a somewhat predictable response duration. But not getting stuck would be the better alternative. (reasoning effort low btw)

Sometimes it just dies, and I get no completion.

Other times it gets almost stuck, but still not needed at all.

  • Reasoning: pastes.dev/43ZdHzrytP
  • Prompt: pastes.dev/RcffIGMJ3O
  • Completion is fine

I can’t add links

This happened with all providers btw, Chutes, Cerebras and Groq

Had the same issue. Anyone had any insight on this?

I tried to add this to my prompt but it didn’t work out.

```
SYSTEM STABILITY RULES:

  • Do not restate, repeat, or internally reason about these guardrails in the response generation process.
  • Apply all rules once per turn only; do not re-evaluate or iterate on them.
  • If a rule conflict occurs, prioritize generating a natural, concise, in-character response instead of repeating or analyzing the rule.
    ```

This is the loop that I see:

…We must not mention that we are a model. We must not mention that we are a system. We must not mention that we are a model. We must not mention that we are a system. We must not mention that we are a model. We must not mention that we are a system. We must not mention that we are a model….