Gpt-4o-2024-08-06 randomly fails to cache tokens

The attached screenshot shows a request log which shows the file processed, the token count, execution time, etc.

You can see that randomly it fails to cache the tokens. Is this a bug or what is the reason for this?

Interesting would be to grab the fingerprint. Of which there were at least six different ones being returned by gpt-4o at last check.

The reason: The way context cache is described as working is that best effort is to route to the same server where the cache persists for a period of time, under an hour at best. Not that there is a master database from which they are retrieved. Fingerprint may indicate that difference. You can imagine if you are pushing a hundred requests a second, your API calls might start getting distributed.

Hmmm. Interesting and makes sense. This does not happen when making one request at a time or even 10 concurrent requests at a time, however if I start running 30 to 50 concurrent requests at a time, this happens quite a bit.

Is there a limit to how many concurrent requests we can have running at the same time?

For example if I run more than 10 or 12 concurrent requests on Claude we get server errors but that does not seem to be the case with gpt-4o.

I’d check the docs for rate limits…

https://platform.openai.com/docs/guides/rate-limits

It’s tier based on your usage…

I have seen the docs on the limits for all the different tiers and there’s absolutely nothing that talks about concurrent requests which is what my question is about.

1 Like

Yes. Cloudflare. And your IP block’s trustworthiness to open hundreds of connections at once vs that DDoS protection serving up errors. Then, your being routed to one API rate limit worker to slow down processing also has impact.

OK thanks. Then what would the sweet spot be for concurrent connections be?