I am using OpenRouter for fast inference on gpt-oss-120b, and I after a not so thorough investigations of the requests on wandb.ai, i’ve found that the model sometimes gets stuck, costing me lots of tokens.
Is there a way to minimize the impact of this endless repetition? I am thinking of limiting the tokens, as I get a somewhat predictable response duration. But not getting stuck would be the better alternative. (reasoning effort low btw)
Sometimes it just dies, and I get no completion.
Other times it gets almost stuck, but still not needed at all.
- Reasoning: pastes.dev/43ZdHzrytP
- Prompt: pastes.dev/RcffIGMJ3O
- Completion is fine
I can’t add links
This happened with all providers btw, Chutes, Cerebras and Groq
