Very bad performance on GPT-3-turbo

rbritom · April 24, 2023, 4:08pm

EDIT:
So I actually found out that is not a performance issue actually, is just the nature of transformers.

If you do a post request instead of streaming, the application will need to generate the response word by word recursively using the previous string of text to predict the probability of the next word, each iteration takes about 0.1 to 0.5 seconds, so a large paragraph will take 15 to 45 seconds to fully generate.
To mitigate this you need to do HTTP streaming to stream the response to the user so that they can see each word being generated so that they know something is happening and won’t get desperate.

ORIGINAL POST:

I am getting a bit worried, I am currently trying to build an application using GPT-3, and I need to make from 2 to 5 requests to the completion endpoint to get a final result. Each request is taking increasingly more time, and I am making all of them in parallel but any of them can take from 12 to 40+ seconds, the last request is sequential so I can’t avoid adding those extra 12 to 40+ seconds. Is there any chance this situation is temporary and is OpenAI working on improving performance, I need to know because I am making some serious investment in this project. I really appreciate any help you can provide.

Topic		Replies	Views
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	6	653	January 9, 2024
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	11	6334	February 20, 2024
Assistant API Performance is Very Slow API plugin-development , api	11	2892	March 7, 2024
Performance issue with gpt-4-turbo-preview API API gpt-4 , api , performance	1	700	February 17, 2024
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	7	5322	December 19, 2023

Very bad performance on GPT-3-turbo

Related Topics