As the question says, I can see my org’s limits on the platform, listed individually by model. Can I call each of these concurrently and get each TPM independently? I can run GPT3.5 and GPT4 concurrently I think? But I’m not sure about all of them.
Here’s my set-up:
MODEL | TPM | RPM |
---|---|---|
CHAT | ||
gpt-3.5-turbo | 90,000 | 3,500 |
gpt-3.5-turbo-0301 | 90,000 | 3,500 |
gpt-3.5-turbo-0613 | 90,000 | 3,500 |
gpt-3.5-turbo-0613-alpha-shared | 250,000 | 3,000 |
gpt-3.5-turbo-16k | 180,000 | 3,500 |
gpt-3.5-turbo-16k-0613 | 180,000 | 3,500 |
gpt-4 | 40,000 | 200 |
gpt-4-0314 | 40,000 | 200 |
gpt-4-0613 | 40,000 | 200 |
Has anyone parallelized all models in the API?