Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up?

Im using this chatgpt-3.5 turbo model to generate some content based on customers need. It is taking more than minute to response.Is there a way i can make it faster ? or it is expected?.


How bid is the response from chat gpt ? Usually, 3.5 turbo does not have a huge response time. Also, do you need all the information at once or streaming would work in your case ? While it might give a complete answer back immediately, it gives a feeling that the model is working rather than being in limbo about if it is struck or not…

Take a look on this topic. We proved the API is intentionally slow

You can use “completion as stream”, it isn’t faster, but you will get a non complete response faster, it will respond like word by word.

Im trying to get some json as response. So i need complete answer since streaming is not an option here. Is there any other options?

Since i need response itself a JSON object im expecting complete response to be in a place.

I am having the same issue. I tried switching form davinci to gpt-3-turbo because I was thinking it is both faster and cheaper - but its unfortunately way too slow for my use case

i am using ChatGPT 3.5,
3 calls to chatGPT in single Query + 1 call to Pinecone

flow =

  1. embedding
  2. pincone search
  3. ChatGPT 3.5 call
  4. Show user Reply
  5. ChatGPT 3.5 call again ( this is a different prompt to generate another command )

the prompt itself ~ 4k token.
about 60~75 seconds.

1 Like