Gpt-4o-2024-08-06 randomly fails to cache tokens

george-p · November 11, 2024, 10:33pm

The attached screenshot shows a request log which shows the file processed, the token count, execution time, etc.

You can see that randomly it fails to cache the tokens. Is this a bug or what is the reason for this?

_j · November 11, 2024, 11:48pm

Interesting would be to grab the fingerprint. Of which there were at least six different ones being returned by gpt-4o at last check.

The reason: The way context cache is described as working is that best effort is to route to the same server where the cache persists for a period of time, under an hour at best. Not that there is a master database from which they are retrieved. Fingerprint may indicate that difference. You can imagine if you are pushing a hundred requests a second, your API calls might start getting distributed.

george-p · November 12, 2024, 1:42am

Hmmm. Interesting and makes sense. This does not happen when making one request at a time or even 10 concurrent requests at a time, however if I start running 30 to 50 concurrent requests at a time, this happens quite a bit.

george-p · November 12, 2024, 1:46am

Is there a limit to how many concurrent requests we can have running at the same time?

For example if I run more than 10 or 12 concurrent requests on Claude we get server errors but that does not seem to be the case with gpt-4o.

PaulBellow · November 12, 2024, 2:07am

I’d check the docs for rate limits…

https://platform.openai.com/docs/guides/rate-limits

It’s tier based on your usage…

georgep · November 12, 2024, 2:34am

I have seen the docs on the limits for all the different tiers and there’s absolutely nothing that talks about concurrent requests which is what my question is about.

_j · November 12, 2024, 5:27am

Yes. Cloudflare. And your IP block’s trustworthiness to open hundreds of connections at once vs that DDoS protection serving up errors. Then, your being routed to one API rate limit worker to slow down processing also has impact.

george-p · November 12, 2024, 6:10am

OK thanks. Then what would the sweet spot be for concurrent connections be?

Topic		Replies	Views
Concurrent request restriction API gpt-4	9	6818	February 7, 2025
Gpt-3.5 concurrent requests limit API	3	6323	February 2, 2024
Prompt caching doesn't seem to work regularly API api , prompt-caching	4	860	July 13, 2025
Is Anyone Getting Slow Response or Internal Server Error? API	5	328	October 19, 2025
My request are getting throttled back API gpt-4 , api-rate-limits	0	173	November 21, 2024

Gpt-4o-2024-08-06 randomly fails to cache tokens

Related topics