API completions endpoint performance

albert1 · November 13, 2022, 5:55pm

I’m doing some performance test using the completions endpoint and I’m getting too many different response times. I’m running a bunch of request with the exactly same prompt (using instruct davinci-002 model) and the response vary too much. For instance, I’m getting responses from 2 sec to 10 sec.

Is there any reason while I’m getting these different response times? If so, how could I improve the performance of the API?

Thanks in advance.

lmccallum · November 13, 2022, 6:16pm

I am having problems with speed too but I was assuming it’s because my prompts are really big.

PaulBellow · November 13, 2022, 6:53pm

Yeah, bigger prompts on the best models will be longer. Comparatively, 10 seconds isn’t shabby when you think of all the lifting being done in the background for likely millions of daily queries…

How big is your prompt + output? Have you tried a fine-tuned Curie model? It’s a lot cheaper and faster. Might work depending on your task. Hope this helps.

lmccallum · November 13, 2022, 10:49pm

Thanks! I don’t think there’s a way around large prompts for my use case. My corpus of text is really complex, and I need answers based only on my corpus. Imagine a reading comprehension test where the students have never seen the information before, e.g. technical, scientific information about a new Planet X, and the answers must be factual, not creative. I have found that zero-shot results with text-davinci-002 using a good prompt works much better than fine-tuning. I can’t create enough question-answer pairs to teach the model (i) all the facts about Planet X and (ii) to answer factually based on the given information about Planet X without inventing anything. I agree 10 seconds is pretty good all things considered. It’s just that people expect instantaneous results for everything nowadays! Thanks again Paul.

albert1 · November 14, 2022, 10:32am

Totally agree with you Paul. I’m having the same issue with a similar approach. My prompt is around 2-3k tokens. I’ve tried using a fine-tuned model but didn’t work well. I guess I’ll need to deal with this long response time.

lmccallum · November 20, 2022, 4:53pm

I am going to add a nice loading signal to my website so that while users are waiting for an answer they’ll see an appealing, colorful progress bar, perhaps with the words “searching…analysing…drafting…” I think that will help a lot with UX.

lmccallum · November 20, 2022, 10:45pm

That would be pretty funny if you created a library of great loading images and said “My goal is for these to become obsolete as soon as possible.”

Topic		Replies	Views
How to reduce OpenAI response time? API	13	17062	December 13, 2023
Completion Speeds - How can we optimise speeds! URGENTLY! API	8	1938	December 25, 2023
Slow Chat api responses ------ API	17	6318	December 24, 2023
Open AI Reponse is Slow API	3	7135	December 25, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	19621	November 9, 2023

API completions endpoint performance

Related topics