Situation:
- I have read many threads here on this topic, but still the following basic question remains unanswered for me (and I believe for many other users too).
- My token usage during the course of December has reached 10Million tokens on gpt-4o-mini and I am getting the message “rate_limited_exceeded”. Consequently, I can’t send requests anymore even for requests with a low number of total tokens.
- My limit (I am at Tier4) is indeed 10,000,000 TPM, but that applies on tokens per minute (TPM) according to the documentation.
- By the way, my total costs in Dec is only about $13.
Question:
- Why I am exceeding a limit and which one? Again, it is a per minute limit, but appears to be a limit for the entire month.
- If I would have indeed exceeded any TPM limit, it should be freed up shortly, let’s say at least within an hour, but it does not. Why?