I posted this on API Community - trying here as well in case more suitable:
Working with a developer on a web app. It is taking something like 50 seconds to a minute and a half to generate the completions.
This is not acceptable, as in no one is going to wait around that long on the internet for the content to load.
I have been told we can use a lesser model, try to reduce the tokens…not sure how to proceed but just feel like we are currently at a bit of an inpass if we can’t get speeds better.
I can totally relate to that as I have seen 5 minute+ completions when plugins are involved. But still the results are amazing. So I think we need to accept at this time that we have some magic in our hands that will only be getting faster and better. It’s a matter of time to improve speed but still amazing on what can be done today even at slow completion times.
Well, they’re kind-of killing their own business here – we’re going to have to try the other guys unless response times come back down. When we started developing, we were hoping responses would speed up over time, not slow down.
I’m happy to pay more for a premium (faster) result, btw, if that were available.