[REQUEST] Better outage handling in the API

Problem:
Currently the API is experiencing an outage. These are common enough that I believe it’s important to make it easier for API users to implement a fallback mechanism. In the event of an outage, the API simply hangs you forever. It’s a pretty lousy pattern to wrap all your API calls in a timer, and even if you do, you can’t cancel a request, and run the risk of the API simply being slow and not out of service, which would result in you performing your operation twice - once with OpenAI and again with your fallback. The status page offers the ability to set up a web hook, but this doesn’t support “polling” the service to see if it’s operational.

Idea:
Add a new error type to represent a service outage or incident, which is returned from any API endpoint when the service is down. It would make it much easier for developers to improve their OpenAI-driven services and handle errors gracefully. Remember that our customers are your customers. If apps that implement OpenAI services perform poorly, the greater consumer sentiment shifts to a disfavor able view of AI services in general.

1 Like

Protip: set the moderations call that you use on untrusted inputs to a short timeout without retry (the SDK defaults to two retries). This service offers fast responses. Inform the rest of the API call chain to not proceed without flag-free results.

1 Like

I support this, otherwise it has to be done manually and there’s a need to switch the LLM provider when there is an outage… one solution i can think of is web scrapping the status page if you notice requests taking too long… but I agree with the outage error handling would be nice in the future