Strange Issue with Missing Large Token Responses

I used GPT-4.1 for system testing, and everything worked normally across 2,000 interactions. Then I switched to GPT-4o-mini for testing. All responses with fewer than 10,000 tokens were returned correctly. However, any response larger than 10,000 tokens was never received — even though the billing records show the interactions were successful.

My network environment is admittedly unstable, as I was using unlimited mobile data for testing.
Has anyone encountered a similar issue?
Is there any way to investigate the root cause of this?
Would it be possible to split large responses to ensure successful reception?

I’d greatly appreciate any help analyzing this issue.
Thank you!

If you use responses API, there is a new parameter background, that allows async processing for stateful requests (store=True).

You immediately get a response ID, and you monitor it until the status change to completed. It’s like a “mini batch”.

This way, even if your connection is unstable, you can make multiple attempts to retrieve it later.

Also, if you have the response ID, you can also retrieve it using the logs dashboard.

2 Likes