Is there any way to make my ChatGPT application generate quicker responses?
It takes about 30 seconds for it to generate a response, and I want to bring it down closer to the 10-15 second response time as I am building a conversation application. I find the responses from Snapchat’s “My AI” is about twice as fast as my application.
Currently, I’m using the GPT-4 API and limiting responses to 300 tokens. I heard that fine-tuning the AI makes it faster, but I wonder if there are any alternatives to this.
Not sure if you are already using but if not, try to enable streaming. It might make the response appear faster than waiting for the complete response.
It seems streaming can not solve this problem, I tested with GPT4 and turbo model in two styles, the results are below:
|turbo stream=True 3.59s|stream=False 3.92s
|GPT-4 stream=True 20.55s |stream=False 20.91s
I’ve never heard of streaming, does this work in Python? I should also mention that I expect detailed and accurate responses from GPT-4, will enabling this affect the quality of responses?
Stream will make it seem faster since you get tokens as they are generated. Not a whole lot of options to make it actually faster. Simplify your prompt, use a less advanced model, maybe try Azure.
if your content (or context) allows to split (token length or tasks) you can multithread your user task and call parallel. I do that with up to 20 threads.