Completion Speeds - How can we optimise speeds! URGENTLY!

admin35 · May 17, 2023, 3:12pm

Working with a developer on a web app. It is taking something like 50 seconds to a minute and a half to generate the completions.
This is not acceptable, as in no one is going to wait around that long on the internet for the content to load.

I have been told we can use a lesser model, try to reduce the tokens…not sure how to proceed but just feel like we are currently at a bit of an inpass if we can’t get speeds better.

Any ideas?

firtina · May 17, 2023, 3:14pm

Have you tried using streaming responses? That gets the first token faster. I have a little wrapper to make that easier in typescript/javascript

But should be able to be done relatively easily in python too if you like, with a little bit of effort.

admin35 · May 17, 2023, 3:16pm

Will your wrapper work if completion is in a different language to English?

firtina · May 17, 2023, 3:18pm

I don’t see any reason where it shouldn’t, in my service users use any language they want, and in the background I use this wrapper

PaulBellow · May 17, 2023, 5:25pm

What model are you using? What size of prompt (input and output)? Multiple parallel requests?

admin35 · May 18, 2023, 3:19am

I have a developer working on my project. Are you available to review and help? We could look at payment if you can identify and provide solution.

patrick.g.olsen · May 18, 2023, 11:51am

Would be weird if your developer did not understand how to implement this. If your developer has issues, I would be glad to help

I see your pain, it’s mainly GPT4 that has the slow response times. Some other stuff I try to do, is to get the answers in shorter form by giving it strong commands in the system message. The response is only sent when the message has finished. Often times the response is filled with all sorts of irrelevant courtesies.

BrianLovesAI · May 19, 2023, 3:14am

Just one more tip, just in case:

If the completion is shorter, the response speed increases.

The underlying reason is that models like GPT-4 generate output sequentially, one step at a time, which naturally takes longer for longer outputs.

Hence, it explains why there is a difference in completion speed between queries like “Tell me about New Zealand?” and “Hello.”

For instance, “Hello.” is completed approximately five times faster.

Topic		Replies	Views
Completion Speeds - ridiculously Slow - waiting over a minute Community chatgpt , api	4	1340	May 17, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	17414	November 9, 2023
API completions endpoint performance API	7	1869	December 25, 2023
Gpt-4 + chat/completions slow responce from api API	3	750	December 5, 2023
How to reduce OpenAI response time? API	13	16381	December 13, 2023

Completion Speeds - How can we optimise speeds! URGENTLY!

Related Topics