Model endless repeating the same statement

I am using OpenRouter for fast inference on gpt-oss-120b, and I after a not so thorough investigations of the requests on wandb.ai, i’ve found that the model sometimes gets stuck, costing me lots of tokens.

Is there a way to minimize the impact of this endless repetition? I am thinking of limiting the tokens, as I get a somewhat predictable response duration. But not getting stuck would be the better alternative. (reasoning effort low btw)

Sometimes it just dies, and I get no completion.

Other times it gets almost stuck, but still not needed at all.

  • Reasoning: pastes.dev/43ZdHzrytP
  • Prompt: pastes.dev/RcffIGMJ3O
  • Completion is fine

I can’t add links

This happened with all providers btw, Chutes, Cerebras and Groq