Instead of increasing the timeout may want to consider resetting it back to default (30 seconds is more than enough) and enable streaming.
-
If there is some sort of connection issue you are now left hanging for potentially 2 minutes. That’s 2 minutes of waiting before you can make a decision.
-
You’ve now noticed that setting an arbitrary timeout still doesn’t solve any issues. So do you keep increasing it, or handle it differently?
-
With a 30 second timeout you know that there’s a connection issue and can implement a retry/backoff library. At the very least you know if it’s an issue with connecting, or an issue with the token generation.
-
In most cases the token output is very slow, but still outputting tokens. So you can monitor tokens/second and use this information as a notice for end-users. If the server sends back 5 tokens and crashes you can just re-send the payload with the additional tokens.