We were extremely excited to have access to the GPT-4 API. We have been waiting with marked anticipation to use it to help build our platform. However, within minutes of using the API it became readily apparent that it is egregiously and prohibitively slow. It is literally taking anywhere from 5-10 minutes for about 10-20,000 tokens. In its present state the GPT-4 API is basically unusable for us.
We used GPT-3.5-turbo but the fidelity of the answers provided was drastically worse than GPT-4, thereby precluding us from using GPT-3.5-turbo.
We are paid users of the API and have no issues spending money on it, but it will literally take us over 1 year to do what we intend with the API as currently situated. In contrast, the same work would only take days using GPT-3.5-turbo as currently constructed.
Interestingly, the GPT-4 API queries are taking far longer than those on the Chat GPT site. We would greatly appreciate it if the GPT-4 API was much faster.
Yes, in the same bucket here. Via the openai.com web client, itâs not super-fast, but my api calls are not passing a lot of context and are extremely slow. Iâm working on a realtime experience for users, so itâs not possible for me at the current rates.
I wonder if itâs a hardware limitation that they are keeping the generations so slow in the API - I can understand why itâs faster in ChatGPT, because they are still limiting you. Or perhaps itâs to prevent an onslaught of users hoping to train local models with it? Either way, Iâm sure they will eventually release the floodgatesâŚ
I am in the same situation, Every call I make to the gpt4 api takes more than 1 minute, even with short responses.
Is there any restriction by the type of user? In my case, access to the gpt-4 api belongs to my personal account, I am not associated with a business account
Same situation here! Extremely slow. It was better when I got the access a few months ago. I am even happy with the same speed as playground but api response is really slow.
There are many parts to this question, but performance is one part. I think itâs widely agreed that PaLM is not as intellectually capable as other models. But it is much faster (1.5 to 3.5 times quicker) and we donât see timeout lapses at all.
In certain use cases, especially those based largely on embedings, there is no perceptible inferencing difference. Vectors, however, come back in 400 milliseconds instead of 1.5 seconds. These are experiences from my anecdotal tests.
I think Google will ultimately pull a few rabbits out of the hat, so this AGI battle seems beneficial for us all.
I can totally see this. Really, all we need is the LLM to âspeak Englishâ and be able to repackage data from the prompt that the embeddings provided.
So, essentially, âlower tierâ LLMâs + Embeddings are all that is required to make the magic happen (if youâve got embeddings laying around).
These are significant too. My only concern is that, at least in my case, my largest requirement is quality, not latency. So if quality starts to suffer due to trying to improve latency, I wouldnât want that. This is why I rarely use GPT-3.5-Turbo, because I need quality over latency.
Here my use case is not using embeddings, but influencing the model to respond back to one thing with different personalities and perspectives. This is one thing I donât think PaLM is good at, but I have an open mind here, and probably just need to experiment with PaLM directly to find out for myself.
Yep - I have a baby-agent project that is also not using embeddings, and PaLM struggles compared to GPT-4. However, I have noticed PaLM seems to be a little better at certain math computations, and this article seems to indicate this may be the case.
But who knows, really? Weâre all just one crappy prompt away from disaster.