@simonchatgpt3 Have you tried gpt-3.5-turbo-instruct? It’s a completion model and you might need to modify your prompt a bit. Depends on the scenario, its TPS is at least 2x-3x higher than the 3.5 chat completion models, which means it can save you at least half of the time.
The responses being slow doesn’t preclude you from batching dozens of them at a time. You don’t need to wait for the last API call to finish in a loop, and you can write code that runs right at your rate limit, retrying if you get denied by the API for rate.