Best practice to increase GPT-4 speed

Hi there,
like many of you had reported about GPT-4 speed, we are experiencing the same slowness. We are using the API to perform questions against our context: so the PROMPT could be very “heavy”. Could it be the reason for this slowness? Are there any tricks to increase the response speed?

Yeah, GPT-4 is very slow. There’s not really a way around it besides having shorter prompts. It helps to cut down complexity too. If it takes a while for a human to comprehend it, GPT does poorly with it too.

davinci performs better & slower than gpt-3.5-turbo, but faster than gpt-4.

You could try sharing your prompt and we could look into optimizing it.

1 Like

Maybe if the prompt solves multiple purposes you may split it up into single purpose prompts.
Which is (alot) more expensive but could be done in parallel.

Although I am using symfony process to start ~70 processes simultanously (could be 90 before you get into timeouts, 70 seems to be pretty safe at least on azure) and it takes some time (like 1-2 seconds) to invoke it.

1 Like

Unfortunately the PROMPT solves only one purpose and can’t be splitted. What i can try to do is to limit the context on which produce an answer.

Sure split up the user input into smaller chunks and use the prompt on that can help.

Also there is a little trick, maybe you can utilize it.

Let me give a little example on that.

When you listen to your wife, but you are not really paying attention your brain buffers the last few words or sentences for an event like “DO YOU EVEN LISTEN TO WHAT I SAY?”. Then you are just taking the last few words from the buffer and repeat them and she can happily continue to tell you about whatever she is currently trying to tell you.

Something similar happens when you are trying to understand any context.
We are trained on reading the first part and trying to get what this is all about and then we read the last part slower to get the meaning of the resolution. That’s also why sandwitched insults work so smoothly.

Well, the models in general seem to work the same way.

So what you want to do is

  1. keep the context as small as possible.

  2. use context breakers like lines or numerations like this one here

  3. make sure the model in front of you is really listening.

and in the last sentence write soemthing like:

Ha gotcha! You did not read everything! Read it again and follow the instructions!

1 Like

The size of the prompt isn’t that meaningul, but the size of the generated text is directly proportional to performance.
Generate shorter results, it runs faster.

1 Like

Thank you all guys, i’ll try to experiment some of your advices :love_you_gesture:

Hey! How did your experiment go? I’m a beginner and I’m trying to make it so gpt-4 can process multiple unrelated prompts at once, up to 50, and I was wondering if you had any tips to make it fast and accurate? :slight_smile:

At the moment we are back using GPT-3.5-turbo, but with the new release of GPT-4-turbo maybe it worth make a try.

Could anyone describe what “slow” means in this context? I also have tried to implement gpt4 into my app for production, but I would find that the full time for the stream to end would take ~40 seconds to show a full ~500 token length response. Is this inline with what other people have or unordinary?

On azure I get at least 1000 token inside my 30 second timeout…
That’s why for my use case I do as many parallel request until I reach the overall tpm limit…

1 Like

Hey, how to share a prompt with you? Where are you based? can i have your email or any way to connect with u and speak further?

Also the formatting as far I can experience.

Table is faster than bold and same time the only way to throw smaller text.