Is there anyway to get the response time down to 2 seconds

Seli · March 3, 2024, 4:47pm

is there anyway to get the response time down to 2 seconds ? I have strip down all files and instructions and still cant get it to less then 7 to 15 seconds

jr.2509 · March 3, 2024, 4:49pm

Hi there - Quite hard to answer in abstract as it depends on so many factors.

But here’s a good overview of the factors impacting latency that you may want to go over for your specific case:

https://platform.openai.com/docs/guides/production-best-practices/improving-latencies

Foxalabs · March 3, 2024, 6:06pm

You can get time to first token below 2000ms with streaming, but it all depends if that’s good enough. If it’s for text to speech responses, then you should be able to get something on the order of 250-1000ms for first utterance if you have a fast text to speech model and you also send the words in small batches of 2 or 3.

vb · March 3, 2024, 7:09pm

If I had to be creative about it I would suggest to split the response into a fast and a slow part.
For example:

task a fast model like 3.5 Turbo to return a filler reply before the real response comes in.

That’s a very observant remark…

shorten the output from the model to fewer words, maybe a single number and then replace it with standard answers matching the category. This would be the most based solution.

1

The answer to your question is …

Utilize streaming for the first response and wait for the real, full reply in the background.

Maybe this approach can be applied to your use case and you can get some ideas out of it.

Seli · April 27, 2024, 4:33pm

thank you all for your help with answering my question

Topic		Replies	Views
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	22824	November 9, 2023
ChatGPT API Very Slow at generating Responses API gpt-4 , api	8	5489	December 25, 2023
Completion Speeds - How can we optimise speeds! URGENTLY! API	8	2166	December 25, 2023
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2969	December 24, 2023
How to reduce the response time from WHISPER STT API stt	0	519	June 13, 2024

Is there anyway to get the response time down to 2 seconds

Related topics