The rate limit is tied to the API, not the model. So, even if you fall back to using a different model, you will still be limited to the same rate limit. The rate limit is enforced on a per-deployment basis, so each deployment of a model has its own separate rate limit.