API completions endpoint performance

I’m doing some performance test using the completions endpoint and I’m getting too many different response times. I’m running a bunch of request with the exactly same prompt (using instruct davinci-002 model) and the response vary too much. For instance, I’m getting responses from 2 sec to 10 sec.

Is there any reason while I’m getting these different response times? If so, how could I improve the performance of the API?

image
image

Thanks in advance.

I am having problems with speed too but I was assuming it’s because my prompts are really big.

Yeah, bigger prompts on the best models will be longer. Comparatively, 10 seconds isn’t shabby when you think of all the lifting being done in the background for likely millions of daily queries…

How big is your prompt + output? Have you tried a fine-tuned Curie model? It’s a lot cheaper and faster. Might work depending on your task. Hope this helps.

2 Likes

Thanks! I don’t think there’s a way around large prompts for my use case. My corpus of text is really complex, and I need answers based only on my corpus. Imagine a reading comprehension test where the students have never seen the information before, e.g. technical, scientific information about a new Planet X, and the answers must be factual, not creative. I have found that zero-shot results with text-davinci-002 using a good prompt works much better than fine-tuning. I can’t create enough question-answer pairs to teach the model (i) all the facts about Planet X and (ii) to answer factually based on the given information about Planet X without inventing anything. I agree 10 seconds is pretty good all things considered. It’s just that people expect instantaneous results for everything nowadays! Thanks again Paul.

1 Like

Totally agree with you Paul. I’m having the same issue with a similar approach. My prompt is around 2-3k tokens. I’ve tried using a fine-tuned model but didn’t work well. I guess I’ll need to deal with this long response time.

1 Like

I am going to add a nice loading signal to my website so that while users are waiting for an answer they’ll see an appealing, colorful progress bar, perhaps with the words “searching…analysing…drafting…” I think that will help a lot with UX. :grinning:

1 Like

That would be pretty funny if you created a library of great loading images and said “My goal is for these to become obsolete as soon as possible.”