Hi there,
like many of you had reported about GPT-4 speed, we are experiencing the same slowness. We are using the API to perform questions against our context: so the PROMPT could be very “heavy”. Could it be the reason for this slowness? Are there any tricks to increase the response speed?
Yeah, GPT-4 is very slow. There’s not really a way around it besides having shorter prompts. It helps to cut down complexity too. If it takes a while for a human to comprehend it, GPT does poorly with it too.
davinci
performs better & slower than gpt-3.5-turbo, but faster than gpt-4.
You could try sharing your prompt and we could look into optimizing it.
Maybe if the prompt solves multiple purposes you may split it up into single purpose prompts.
Which is (alot) more expensive but could be done in parallel.
Although I am using symfony process to start ~70 processes simultanously (could be 90 before you get into timeouts, 70 seems to be pretty safe at least on azure) and it takes some time (like 1-2 seconds) to invoke it.
Unfortunately the PROMPT solves only one purpose and can’t be splitted. What i can try to do is to limit the context on which produce an answer.
Sure split up the user input into smaller chunks and use the prompt on that can help.
Also there is a little trick, maybe you can utilize it.
Let me give a little example on that.
When you listen to your wife, but you are not really paying attention your brain buffers the last few words or sentences for an event like “DO YOU EVEN LISTEN TO WHAT I SAY?”. Then you are just taking the last few words from the buffer and repeat them and she can happily continue to tell you about whatever she is currently trying to tell you.
Something similar happens when you are trying to understand any context.
We are trained on reading the first part and trying to get what this is all about and then we read the last part slower to get the meaning of the resolution. That’s also why sandwitched insults work so smoothly.
Well, the models in general seem to work the same way.
So what you want to do is
-
keep the context as small as possible.
-
use context breakers like lines or numerations like this one here
-
make sure the model in front of you is really listening.
and in the last sentence write soemthing like:
Ha gotcha! You did not read everything! Read it again and follow the instructions!
The size of the prompt isn’t that meaningul, but the size of the generated text is directly proportional to performance.
Generate shorter results, it runs faster.
Thank you all guys, i’ll try to experiment some of your advices