When the input to the Moderation API is long, it raises a 429 rate limit error without actually reaching the rate limit. This is misleading as waiting and resending the same request would result in the same error.
Code to Reproduce the Error (Edited again, another user is able to reproduce this error, see below replies)
from openai import OpenAI
import time
import requests
client = OpenAI()
response = requests.get("https://raw.githubusercontent.com/da03/moderation_issue/main/example.txt")
flag_produce_error = True # when True, produces a 429 error; False, no error
if flag_produce_error:
text = response.text[:6000]
else:
text = response.text[:5999]
print (f'Number of characters: {len(text)}')
try:
response = client.moderations.create(input=text)
except Exception as e:
print ('error', e)
Result of the Above Code
Using 6000 characters will fail (by setting flag_produce_error
to True
in above code) with a 429 Rate limit error, while using 5999 characters will succeed (by setting flag_produce_error
to False
in above code) will pass.
Number of characters: 6000
error Error code: 429 - {'error': {'message': 'Rate limit reached for text-moderation-007 in organization org-WAXHZpHsjoSbCdNbzcNU959e on tokens per min (TPM): Limit 150000, Used 138657, Requested 20493. Please try again in 3.66s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
Expected Behavior
Should have raised an error code/message reflecting that the underlying issue is input length but not rate limit.
Follow-Up Question
Whatâs the context length limit of Moderation? The example I used is taken from real user-ChatGPT conversations (WildChat) so I thought it should work.
Update on Apr 17, 2024
After a deeper analysis on the WildChat dataset inspired by pondin6666âs observation, I suspect that these errors are linked to inputs containing non-Latin characters.
Key Findings:
- Language-specific Error Rates: The errors disproportionately affect texts in certain languages:
- Korean: Accounts for 66.44% of all errors, yet only 0.51% of the dataset.
- Chinese: Makes up 10.96% of errors, 13.54% of the dataset.
- English: Constitutes 6.85% of errors, while making up 54.92% of the dataset. These cases mostly contain special characters like Ï or âą
- Japanese, Hindi: Also show significant error rates compared to their presence in the dataset.
Practical Workaround:
In response, Iâve written a workaround by segmenting large text inputs into smaller chunks. The implementation of this workaround is detailed in the repository linked below.
For those interested in replicating the issue or trying out the workaround, I have documented everything, including code and failing examples in different languages, in this GitHub repository: GitHub Repo Link.