I’ve noticed some inconsistencies too, especially with multi-step reasoning and tool calls that used to work smoothly a few weeks ago. It feels like the model sometimes “forgets” earlier context or simplifies responses more than before.
It could be related to temporary backend changes, load balancing, or quiet updates to how GPT-4.1 handles context windows. OpenAI hasn’t announced any model switch, so it’s probably worth sharing a few concrete examples in the API feedback section — that usually helps them pinpoint if it’s regression-related or behavior drift.