@crowdreactor thanks a lot for sharing the details about the implementation. This is the only way we developers can help other debug their problems. So, for your case, I’d say:
- 5 retries is probably too much, especially if you’re using the same model all the time.
- 20s timeout is probably too short, especially if you’re asking for huge completions and you’re not streaming.
If your base model is gpt-3.5-turbo
, I’d say to experiment with something like:
- 1 call to turbo with
timeout = 30s
.
Wait for 4s. - 1 call to turbo with
timeout = 30s
.
Wait for 8s. - 1 call to davinci-003 with
timeout = 30s
.
And yeah, the output would obviously depend on the model. You can try to optimize your prompt for your model. Even if you don’t, it’s usually better to return something rather than nothing. Anyways, the actual implementation totally depends on your use case. You might want to set up even longer initial timeouts (1min or more), especially if your customers do not need online interaction with your app.