Organizational API Rate Limit Q: also PER MODEL? Scenarios inside... 👀

ZAdam · August 14, 2023, 10:22pm

As the question says, I can see my org’s limits on the platform, listed individually by model. Can I call each of these concurrently and get each TPM independently? I can run GPT3.5 and GPT4 concurrently I think? But I’m not sure about all of them.

Here’s my set-up:

MODEL	TPM	RPM
CHAT
gpt-3.5-turbo	90,000	3,500
gpt-3.5-turbo-0301	90,000	3,500
gpt-3.5-turbo-0613	90,000	3,500
gpt-3.5-turbo-0613-alpha-shared	250,000	3,000
gpt-3.5-turbo-16k	180,000	3,500
gpt-3.5-turbo-16k-0613	180,000	3,500
gpt-4	40,000	200
gpt-4-0314	40,000	200
gpt-4-0613	40,000	200

Has anyone parallelized all models in the API?

ZAdam · August 15, 2023, 6:41pm

replying to my own comment… now i’m missing this model entirely since making this post

Topic		Replies	Views
Different gpt-4 level models to mitigate rate limiting issues API	6	1035	September 13, 2023
Regarding rate limit in multi model API	3	939	October 25, 2023
What's the maximum number of concurrent requests allowed? API	1	44	December 31, 2024
Org-Level Rate Limitting Implementation Info? API chatgpt , api	0	555	May 25, 2023
Is max token limit per endpoint or model? API gpt-35-turbo , chatgpt , rate-limit	1	1498	June 29, 2023

Organizational API Rate Limit Q: also PER MODEL? Scenarios inside... 👀

Related topics