Overloaded, but still paying

Hi, Has the gpt-3.5-turbo been overloaded all day? I see that my apis are failling, but OpenAI is still charging full price for the call even though they only deliver an error?

I’ve found that there’s some kind of caching involved.
Sometimes, when I get an error, and I re-try the request within a few seconds, I get a very fast answer, so it seem as if they cache the exact request.
It might be that the “overload error” is really a “gateway timeout error,” and the model actually keeps inferring on the back end, even after the gateway has expired.
If that’s the case, then it looks to their system as if your request did indeed generate a bunch of work for them.

A single inference is so low cost that I don’t worry about the cost here. If it were to happen to, say, 50% of requests, then that might be a different question…

Also, yes, it feels as if the API has been a lot slower in the last week.