I have two models deployed in Azure OpenAI namely:
- gpt-4 with rate limit 20k TPM
- gpt-4-32k with rate limit 60k TPM
When my code reaches the rate limit using gpt-4, it fallsback to use gpt-4-32k.
This way, am I effectivly getting 80k (20k + 60K) token limit?
The rate limit is tied to the API, not the model. So, even if you fall back to using a different model, you will still be limited to the same rate limit. The rate limit is enforced on a per-deployment basis, so each deployment of a model has its own separate rate limit.