The reason why you are encountering that error is because gpt-4 has rather low limits for tokens-per-minute.
Tokens are the internal AI encoding that represents words and parts of words as pieces.
The rate limit doesn’t actually count the tokens though: it estimates based on characters you input. However, it does consider the value you specify in max_tokens as counting against the rate limit in tokens.
If you specify a large max_tokens
, you may be blocking yourself even though you are only getting a small response with that call. You can reduce the the value of that parameter, or more effective, remove the parameter entirely so it doesn’t count against you even before you used the AI.
The performance of the AI solution you’ve written will have to be improved by your instructions and using correct messaging to the model.