Im using this chatgpt-3.5 turbo model to generate some content based on customers need. It is taking more than minute to response.Is there a way i can make it faster ? or it is expected?.
How bid is the response from chat gpt ? Usually, 3.5 turbo does not have a huge response time. Also, do you need all the information at once or streaming would work in your case ? While it might give a complete answer back immediately, it gives a feeling that the model is working rather than being in limbo about if it is struck or not…
Take a look on this topic. We proved the API is intentionally slow
You can use “completion as stream”, it isn’t faster, but you will get a non complete response faster, it will respond like word by word.
Im trying to get some json as response. So i need complete answer since streaming is not an option here. Is there any other options?
Since i need response itself a JSON object im expecting complete response to be in a place.
I am having the same issue. I tried switching form davinci to gpt-3-turbo because I was thinking it is both faster and cheaper - but its unfortunately way too slow for my use case
i am using ChatGPT 3.5,
3 calls to chatGPT in single Query + 1 call to Pinecone
flow =
- embedding
- pincone search
- ChatGPT 3.5 call
- Show user Reply
- ChatGPT 3.5 call again ( this is a different prompt to generate another command )
the prompt itself ~ 4k token.
about 60~75 seconds.