I’m doing some performance test using the completions endpoint and I’m getting too many different response times. I’m running a bunch of request with the exactly same prompt (using instruct davinci-002 model) and the response vary too much. For instance, I’m getting responses from 2 sec to 10 sec.
Is there any reason while I’m getting these different response times? If so, how could I improve the performance of the API?
Yeah, bigger prompts on the best models will be longer. Comparatively, 10 seconds isn’t shabby when you think of all the lifting being done in the background for likely millions of daily queries…
How big is your prompt + output? Have you tried a fine-tuned Curie model? It’s a lot cheaper and faster. Might work depending on your task. Hope this helps.
Thanks! I don’t think there’s a way around large prompts for my use case. My corpus of text is really complex, and I need answers based only on my corpus. Imagine a reading comprehension test where the students have never seen the information before, e.g. technical, scientific information about a new Planet X, and the answers must be factual, not creative. I have found that zero-shot results with text-davinci-002 using a good prompt works much better than fine-tuning. I can’t create enough question-answer pairs to teach the model (i) all the facts about Planet X and (ii) to answer factually based on the given information about Planet X without inventing anything. I agree 10 seconds is pretty good all things considered. It’s just that people expect instantaneous results for everything nowadays! Thanks again Paul.
Totally agree with you Paul. I’m having the same issue with a similar approach. My prompt is around 2-3k tokens. I’ve tried using a fine-tuned model but didn’t work well. I guess I’ll need to deal with this long response time.
I am going to add a nice loading signal to my website so that while users are waiting for an answer they’ll see an appealing, colorful progress bar, perhaps with the words “searching…analysing…drafting…” I think that will help a lot with UX.