Is max token limit per endpoint or model?

From the docs:
It is important to note that the rate limit can be hit by either option depending on what occurs first. For example, you might send 20 requests with only 100 tokens to the Edit endpoint and that would fill your limit, even if you did not send 150k tokens within those 20 requests.

But since different models have different token rate limits (16k has x2 the token limit), will I effectively be able to use 270k tokens per minute for 3.5 (90k for 3.5 and 180k for 3.5-16k)?

What about number of requests? (I assume it’s per endpoint, so shared, but still)



As far as I know it happens even randomly. Gonna do some tests to get the limits on some tonight.