Best practice to increase GPT-4 speed

gianluca.suzzi · August 1, 2023, 9:20am

Hi there,
like many of you had reported about GPT-4 speed, we are experiencing the same slowness. We are using the API to perform questions against our context: so the PROMPT could be very “heavy”. Could it be the reason for this slowness? Are there any tricks to increase the response speed?

smuzani · August 1, 2023, 9:43am

Yeah, GPT-4 is very slow. There’s not really a way around it besides having shorter prompts. It helps to cut down complexity too. If it takes a while for a human to comprehend it, GPT does poorly with it too.

davinci performs better & slower than gpt-3.5-turbo, but faster than gpt-4.

You could try sharing your prompt and we could look into optimizing it.

jochenschultz · August 1, 2023, 10:38am

Maybe if the prompt solves multiple purposes you may split it up into single purpose prompts.
Which is (alot) more expensive but could be done in parallel.

Although I am using symfony process to start ~70 processes simultanously (could be 90 before you get into timeouts, 70 seems to be pretty safe at least on azure) and it takes some time (like 1-2 seconds) to invoke it.

gianluca.suzzi · August 1, 2023, 12:13pm

Unfortunately the PROMPT solves only one purpose and can’t be splitted. What i can try to do is to limit the context on which produce an answer.

jochenschultz · August 1, 2023, 12:40pm

Sure split up the user input into smaller chunks and use the prompt on that can help.

Also there is a little trick, maybe you can utilize it.

Let me give a little example on that.

When you listen to your wife, but you are not really paying attention your brain buffers the last few words or sentences for an event like “DO YOU EVEN LISTEN TO WHAT I SAY?”. Then you are just taking the last few words from the buffer and repeat them and she can happily continue to tell you about whatever she is currently trying to tell you.

Something similar happens when you are trying to understand any context.
We are trained on reading the first part and trying to get what this is all about and then we read the last part slower to get the meaning of the resolution. That’s also why sandwitched insults work so smoothly.

Well, the models in general seem to work the same way.

So what you want to do is

keep the context as small as possible.
use context breakers like lines or numerations like this one here
make sure the model in front of you is really listening.

and in the last sentence write soemthing like:

Ha gotcha! You did not read everything! Read it again and follow the instructions!

jwatte · August 1, 2023, 5:07pm

The size of the prompt isn’t that meaningul, but the size of the generated text is directly proportional to performance.
Generate shorter results, it runs faster.

gianluca.suzzi · August 2, 2023, 8:15am

Thank you all guys, i’ll try to experiment some of your advices

alessandroamenta1 · November 1, 2023, 11:45am

Hey! How did your experiment go? I’m a beginner and I’m trying to make it so gpt-4 can process multiple unrelated prompts at once, up to 50, and I was wondering if you had any tips to make it fast and accurate?

gianluca.suzzi · November 7, 2023, 10:09am

At the moment we are back using GPT-3.5-turbo, but with the new release of GPT-4-turbo maybe it worth make a try.

tyeetale · November 7, 2023, 10:33am

Could anyone describe what “slow” means in this context? I also have tried to implement gpt4 into my app for production, but I would find that the full time for the stream to end would take ~40 seconds to show a full ~500 token length response. Is this inline with what other people have or unordinary?

jochenschultz · November 8, 2023, 7:00am

On azure I get at least 1000 token inside my 30 second timeout…
That’s why for my use case I do as many parallel request until I reach the overall tpm limit…

ahmedshochin · November 12, 2023, 7:56am

Hey, how to share a prompt with you? Where are you based? can i have your email or any way to connect with u and speak further?

fabrizio.salmi · November 12, 2023, 10:58am

Also the formatting as far I can experience.

Table is faster than bold and same time the only way to throw smaller text.

akramalala6 · October 15, 2024, 11:29am

Ways in which ict tools
are used in school

Topic		Replies	Views
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	6556	December 16, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	18868	November 9, 2023
Completion Speeds - How can we optimise speeds! URGENTLY! API	8	1874	December 25, 2023
Slow Chat api responses ------ API	17	6269	December 24, 2023
Improve response time of GPT API gpt-4	1	959	December 30, 2023

Best practice to increase GPT-4 speed

Related topics