What are the default rate limits for the File API

I’ve found a lot of information around Usage Tiers and Rate limits for all th different models. But what are the rate limits for the FILE API? Strictly speaking not about vectorizing them, just the upload and delete operations. I don’t seem to find any information on that?

https://platform.openai.com/docs/api-reference/files

There is none listed. The limitation will likely be practical upload network bandwidth throughput of the server IP a single client gets than by any imposed rate limit of calls. After you are getting max speed, there is no point in opening more parallel connections.

The API can accept hundreds a second in AI API calls from a client, requiring more computation and more persistent connections than POST multipart/form-data.

Calls that are made to Assistants endpoint methods have ridiculously lower API call limits, though.

Getting more technical, if the files are small:

(after 3000 tokens of what I say to o1-preview being rephrased and spit back at me)…

Maximum Upload Rate:

Without Optimization: Limited to tens of thousands of uploads per minute due to port exhaustion.

With Persistent Connections and HTTP/2:
Potentially Millions of Uploads per Minute:
By minimizing new connections and maximizing the use of each connection.
Socket and port limitations become negligible compared to bandwidth and processing capabilities.

Okay I thought something similar, however I’m still seeing requests failing with a 503 error when I hit it with ~100 request in parallel, it looks like that the FILE APIs are not scaled that well, which makes sense since the File Search is still beta. I wish the SDKs would handle failures more gracefully… And without any known numbers, it’s hard to implement any batching, rate-limit logic…

503 comes from Cloudflare. Firewall policies that can be set on the service may be affecting the rate possible, as Cloudflare uses connection trust and is meant to prevent DDoS.

The files endpoint has been in use since fine-tuning was first available for GPT-3.

The Python API SDK client does retry 2 times, an instantiation parameter you can pass, for example if you want to disable it to have a more codeable queue.