Hi!
New here, I’m developing an application based on OpenAI for my company. In my application, I do some easy stuff using GPT3.5 and for the hard generation I invoke GPT4. GPT4 is a marvel, doing very complicated tasks flawlessly, but it unfortunately costs too much and I cannot market the application only calling GPT4.
Now, I’m having problems with the API calls to GPT3.5. I’m calling chat.completions.create in the standard openai client for python, but I’m experiencing wildly variable answer times. For the exact same prompt, I sometimes get the answer immediately, sometimes I get it after something like 3 minutes. Unless I kill the process and restart it, it usually reverts back to being almost instantaneous.
This variability is quite horrible, and I would like to know if you also experience this. Did you find a solution?
GPT4 is way faster, and never hangs for me.
Thanks!