The opening response time for access to the GPT4 model is very high. usually the prompt for a 100 word context and a 50 word prompt is taking more than 1 minute to respond. I use a bubble api to call from a nocode application. What can be done to optimize this call and not leave the user waiting so long?
I also noticed this and I wonder why ? I understand the demands, but since this is a py-as-you-go API then there should be some limit to the response time so that it won’t take that long.
I have no idea why, but I think it has to do with higher demands
The API response should be streamed. The API will then give a word-by-word production like you see in ChatGPT.
If you don’t get the first token back from the model within a few seconds, you should close the connection and retry.
Are you literally waiting a minute before you see a streamed response start to generate? You can replicate the behavior in the API playground? Maybe this is the “latency” they promise to punish people with in the new tier system.