GPT-4.1 Responses API stalls once convo → ~350 k tokens (started 11 Jul 2025) — anyone else?

jhariani · July 11, 2025, 6:05pm

Since this morning (11 July 2025) we’ve noticed that many streaming requests to the Responses API with gpt-4.1-2025-04-14 freeze once the running conversation—including JSON returned by multiple function calls—hits roughly 300–400 k tokens. Yesterday the exact same workflow sailed past 600 k tokens without any issues.

We’ve found a temporary workaround: trim each tool response. For instance, returning the top 25 records instead of 100 keeps the total context under ~330 k tokens, and the stall never occurs.

Is anyone else seeing this behavior? It seems directly tied to longer context interactions with gpt4.1 and the responses API.

Thanks!

Topic		Replies	Views
Gpt-4-0613 model just quits after ~2 seconds API gpt-4 , chatgpt , api	1	1262	June 14, 2023
Is it me or GPT4 consistently doesn't finish and cuts the answers? API	18	6964	April 11, 2024
Responses.create() hangs indefinitely with large payloads on GPT-5 models (works fine with same inputs in O3 and small inputs tests in GPT-5) Bugs gpt-5	1	122	October 31, 2025
GPT-4 API slow response over 60sec API	6	2783	February 16, 2024
Chat Gtp-4 timeout after about 10 successive calls API gpt-4	0	484	November 9, 2023

GPT-4.1 Responses API stalls once convo → ~350 k tokens (started 11 Jul 2025) — anyone else?

Related topics