Can we please get some communication from OpenAI regarding this issue? I’ve reported the same issue to the support bot probably 10 times at this point and never get a response. The status page says all is well, meanwhile our production app that use text-davinci-003 has been hitting this error at an alarming rate for the past 2 weeks. In the past 24 hours 10-15% of our requests have failed.
We are nowhere near any actual rate limit on our account – we spend $20-$30 per month spread over the month pretty evenly.
Rate limit is not the same as usage limit, rate limit refers to the number of requests per (i think) 60 seconds).
Is there any reason why your code should be trying to send many requests in a short period? if this is normal you can request to have the rate limit increased.
I appreciate you taking the time to respond but please, let’s not get off target. I know what a rate limit is. As noted, our app usage is nowhere near any rate limit. This post has nothing to do with the monthly usage limit.
We are sending a few hundred requests per day, not in any spikey manner. There is simply no way we’d be hitting the RPM or TPM limits.
Several times in the past when I reported these issues it turned out there was degraded performance on text-davinci-003 that had just gone unreported. Now in recent weeks the degraded performance is just never reflected on the API status page, and the support chat bot is basically a black hole.
I can tell you my current workflow with clients is to R&D with the OpenAI API endpoints for simplicity and ease of use and then at production swap to Azure, this has latency and performance benefits for the same price.
Thanks – I will look into that. I also just signed up for the Anthropic waitlist and will test a few others that may be sufficient for our use case. Would still like to see this issue addressed by OpenAI though as it is my preference to continue using their API…seems like I’m not being given a choice though.
At the moment the entire industry is in the same boat, lack of GPU/Specialist AI hardware at scale. You might get a less logically performant model to run faster, but as far as uptime and dev base support goes, OpenAI has been the best for me.
Depending on your use case, you may find other models are fine, but I tend to be in a solution space that requires higher order logical and deductive reasoning to reduce overall latency, which GPT-4 provides for me, albeit at a cost premium right this second.
To get the kind of end to end solutions with minimal hallucination and maximum accuracy, you either need to use a higher order model or several passes with a lesser model in a result building process. I’ve found similar performance, latency wise, with GPT-4 single pass, as opposed to faster models running multiple passes.
What bothers me most about this is that A) the OpenAI status page is lying and B) their support chat bot is just a waste of time. Yes, OpenAI is under extreme demand, and yes the fact that their API is not completely reliable is somewhat of a known thing at this time, but that doesn’t excuse A or B for me at this point. I had already coded in fallback logic but am now looking at replacing OpenAI as the primary. I don’t need GPT-4 level quality for my use case.
Sure, I understand your concerns, at this stage in AI development I am setting my clients expectations to a little above beta testing, my usual line is “AI will reduce your front line handling requirements by 50-80%, but you should scale that down progressively as the technology is still firmly in the early development phase and you need to have contingency around that for now” then I can work with them to transition to AI over the coming months and years.
Your environment could be significantly different, but if you expect any of the current AI providers to give you a performant solution with well established companies levels of support and uptime, you are going to be disappointed, I work with them all and they all have significant hurdles to overcome in terms of performance at scale.
I wish you well and I hope you get a solution that works for you and your client/company.
Hey guys I have the same issue - Rate limit should be nowhere near my current usage and sometimes the same code (to generate content) is working and lately almost all of the times I am running into a “Rate Limit Error” immediately.
You can log the headers that show your current by-the-minute decreasing rate limit consumption.
max_tokens will also be counted as possibly exceeding when sending input. You can omit this parameter and your prior “max_tokens = 6000” won’t immediately deduct or block.
Consider with gpt-4, you may only have 10000 tokens per minute. A 4000 in 2000 out API call, and then there is no again specifying “max_tokens = 6000” until the minute expires.
Sounds interesting do you have code snippet in python on how to implement it? Currently trying to generate content I am constantly running into this issue the moment I start my code.
It’s setting openai.log='debug' on the python library, or piping and a text editor that says “this file has been reloaded” or other trixyness getting into the library files and making it extract or monitor rate limits per model.