Anyone facing gpt-3.5-turbo API delays?

Instead of increasing the timeout may want to consider resetting it back to default (30 seconds is more than enough) and enable streaming.

  1. If there is some sort of connection issue you are now left hanging for potentially 2 minutes. That’s 2 minutes of waiting before you can make a decision.

  2. You’ve now noticed that setting an arbitrary timeout still doesn’t solve any issues. So do you keep increasing it, or handle it differently?

  3. With a 30 second timeout you know that there’s a connection issue and can implement a retry/backoff library. At the very least you know if it’s an issue with connecting, or an issue with the token generation.

  4. In most cases the token output is very slow, but still outputting tokens. So you can monitor tokens/second and use this information as a notice for end-users. If the server sends back 5 tokens and crashes you can just re-send the payload with the additional tokens.

2 Likes