Hello,
We are using GPT 3.5 turbo 26k model assistant, and trigger thread creation, runs and message creation from the API. The performance is bad, leading to the application being unpresentable, thus useless.
We understand that GPT4 turbo, the latest models etc., are much slower, but shouldnt the most decent model (regarding reasoning) of the past releases be more decent also in response times?
Or is this because of the beta version?
The average response time is 10-15sec, sometimes even more than that.
Please, we would like to have an official answer as this is a company need, not just experimenting.
Thank you.