Hello. I am creating a content generator tool with GPT3.5, and I get several API errors. I wonder if this due to reaching the max number of simultaneous connections? What is the max limit of call per second?
Welcome to the community.
Here’s the docs on rate limits
Actually with gpt_3.5 … With paid plan. We can go in the afternoon (Europe time) to 30/40k tokens.a minute with around 60 parallel api calls without problem…
Trying to go to the 90k limit always bring problems … At least in the last month.
And be sure to monitor and manage your rate calls.
Use tiktoken and ratelimit python Library… 2 lines of code and done
Thanks, glad to be here Everything is clear now!
Thanks for the tip mate! I’ll just need some slight adjustments but the limits are reasonable.
With those rate you will still have around 1% error doing 10k queries to the API.
And I guess it may vary with the complexity of your prompts, the average len of your answers (around 500 tokens in our case) with user prompt and system prompts around 100 tokens.
We are using the async acreate python endpoint … For some queries you will have no answer or a timeout…
So you need to manage those errors afterwards or re adding them to the the queue of jobs to be done.
Good point. I think this is where my issue lies since I checked again, and definitely I’m not nearly close the API limits mentioned above.