I am sending API requests to gpt-3.5-turbo-1106
and gpt-4-0613
that are pretty small:
"prompt_tokens": 308,
"completion_tokens": 670,
"total_tokens": 978
It takes 60 to 120 seconds to respond to such request.
There is just no way my users are going to wait even a minute to get a response. There is no amount of loading animations and funny quotes I can pack into my loading screen to make this even remotely feasible…
I am seeing a lot of similar complaints on the forum about the response time but no solutions. What can be done?