We have an application that queries OpenAI with multiple prompts. However, many of the requests we send to OpenAI take many seconds to return the result. This ends up bringing a bad experience for the end user. I believe this is a problem that more people face. Do you have any solution for the problem? Or else, good practices on how to improve response time. Thank you very much.
The max_tokens parameters hurts latency significantly. Try to reduce it as much as you can.
Also: there is the possibility of streaming results instead of waiting until the response is fully-computed. Hope that helps!
Very nice! The content is so good. We’ll check every recommendation on this document. This definitively will help a lot of people.Good job and thank you so much @logankilpatrick and whole team.