I’m running a chat app called Better Path that uses gpt-3.5-turbo.
During chat sessions, the API calls take around 2-5 seconds but, there’s another API call that kicks in while the page is initially loading, and it’s really slow, taking between 10-25 seconds. This really ruins the user experience because it makes my page load times drag.
What’s puzzling me is, the chat prompts (the quick ones) are much longer than the page loading prompt (the slow one). Seems odd, right?
I’m scratching my head here. Any ideas why this might be happening? I know prompt size usually plays a role, but it doesn’t look like the case here. Are there other things I should look out for?
Yes it’s a web app, The first API call takes the user’s past conversation and summarizes it. It does this when the page is loading. Both the chat and the initial summarization use gpt-3.5-turbo. The api calls look identical but here are the times printed out on my console;
first api call execution time: 21.968876838684082 seconds
if you want to post some examples of the prompts, I’m happy to run them here on a different account so you know you’re not going mad - I can let you know the speeds I get back for them?
Have you tried executing them each manually in code or even direct with Curl / Postman?