I, like many of the rest of us, are running into the random 600s read_timeout error with the ChatCompletions api using gpt-3.5-turbo. I have looked through the python code and see that there is a timeout parameter that doesn’t seem to be documented in the api docs.
I have tried setting this timeout and while it is accepted by OpenAI without error it does not seem to have an effect. The timeout still occurs at 600s when most of my requests take between 2 and 5 sec.
Has anyone come up with a way to set the timeout or configured a server side timeout without going to Requests?
Actually, normally managing the timeout for a client-server transitions, like the OpenAPI APIs calls occurs in the client-side code.
This is true of just about every API call (not just OpenAI).
For example when the network is congested, the server might never see the API call, so the timeout will be managed by the client.
Extending this to OpenAI models, when their model is congested, it might not even return an error message (as we have been seeing today), and so it remains the responsibility of the client-side to manage the timeout, which is subtask of overall error and exceptions handling.
Appendix: Example Tutorials for “Timeouts in Python Requests”
I call the API from AWS Lambda in production, which has a configurable timeout value for each function you write. It has built in retry’s too.
For high priority stuff, I would invoke the function and also write the invocation event to a database. Then check the database every minute to see if it fired successfully, and retry up to N times until giving up.
In addition to this: I’d also recommend having some sort of fallback strategy. Sometimes a model is completely off but the other ones are working seamlessly. Retrying the same model over and over will not help, but falling back to a different (usually worse) model will do.
In my case, a generic service checks out the model’s health by pinging them every minute, and updates this models’ health in a database. When any of the other services want to call OpenAI’s API using a particular model, they firstly retrieve the model’s health from the database. If it’s ok, they call it with a retry mechanism. If not, they call the best one available at that precise moment (also with a retry mechanism).
Thanks - no go with retry either. I’m a noob so I don’t really know whats happening, but it appears that functions can block these decorators from working. For example, if you wrap a time.sleep call, it won’t raise an exception until the time.sleep actually ends.