When I tried multiple Arabic queries on the gpt-3.5-turbo model, it took approx 1 minute to 3 minutes time to respond query. But Davinci took approx 5 seconds to 10 seconds. Is there any suggestion to optimize response time?
Without access to what’s happening behind OpenAI’s API, a thought:
Did you try the exact same input on both? Input and output token count affects the response time, so that may be a cause of differences.
You can count tokens without calling the API: see this help page.
Yes I have try exact same input on both. If I have use max_token value as 2000 then Davinci model also take approx 1 Minutes to 1.5 Minutes. Still it’s very high response time.
- Is there any way to optimize the response time.
- Can response time will be improve on paid module.
More information: I have tried https://api.openai.com/v1/chat/completions and https://api.openai.com/v1/completions