Performance issue after migrating to the new python API

I recently upgrade the environment to v1.2.4 and have to rewrite the python API due to the upgrade. However, I notice that the performance declines with this change.
1.I was using the async complete chat function, and after switch to the new async client, I frequently got timeout for the same request. Previously, I set a timeout time of 20s, and had no problem, but now I set to 50s and still got timeout for the gpt response.
2. I also noticed that the result is different even when my temperature is 0, and using the same GPT model.
No sure what is happening underhood. Does anyone have the same experience?

Hi and welcome to the Developer Forum!

I think your issue may be load related, after Devday and a new large influx of users the servers are under heavy load, this has regrettably caused some slowdown of services and I think may be the cause of your timeout issue.

In my own work I use the streaming option where possible and take advantage of the fast time to first token and time per token to ensure my code is responsive and agile. That may be an option for you to look at to build your code into a more robust solution.

1.2.4? 1.3.0 is out!

  • The models are just timing out with no response, especially anything -1106. You could use curl and get the same.
  • a temperature of zero isn’t actually “determinstic” or “a real value”, it is just a placeholder for a low temperature, but not as low as the 0.0000000000001 you could specify.

More straightforward to get all the determinism the model can offer is top_p = 1e-9