Too long response time on API gpt-3.5-turbo model

When I tried multiple Arabic queries on the gpt-3.5-turbo model, it took approx 1 minute to 3 minutes time to respond query. But Davinci took approx 5 seconds to 10 seconds. Is there any suggestion to optimize response time?

Without access to what’s happening behind OpenAI’s API, a thought:

Did you try the exact same input on both? Input and output token count affects the response time, so that may be a cause of differences.

You can count tokens without calling the API: see this help page.

Yes I have try exact same input on both. If I have use max_token value as 2000 then Davinci model also take approx 1 Minutes to 1.5 Minutes. Still it’s very high response time.

  • Is there any way to optimize the response time.
  • Can response time will be improve on paid module.

More information: I have tried https://api.openai.com/v1/chat/completions and https://api.openai.com/v1/completions