Since this morning (11 July 2025) we’ve noticed that many streaming requests to the Responses API with gpt-4.1-2025-04-14 freeze once the running conversation—including JSON returned by multiple function calls—hits roughly 300–400 k tokens. Yesterday the exact same workflow sailed past 600 k tokens without any issues.
We’ve found a temporary workaround: trim each tool response. For instance, returning the top 25 records instead of 100 keeps the total context under ~330 k tokens, and the stall never occurs.
Is anyone else seeing this behavior? It seems directly tied to longer context interactions with gpt4.1 and the responses API.
Thanks!