I’m running a chat app called Better Path that uses gpt-3.5-turbo.
During chat sessions, the API calls take around 2-5 seconds but, there’s another API call that kicks in while the page is initially loading, and it’s really slow, taking between 10-25 seconds. This really ruins the user experience because it makes my page load times drag.
What’s puzzling me is, the chat prompts (the quick ones) are much longer than the page loading prompt (the slow one). Seems odd, right?
I’m scratching my head here. Any ideas why this might be happening? I know prompt size usually plays a role, but it doesn’t look like the case here. Are there other things I should look out for?
Any tips or insights would be appreciated.
Not really sure if I understand - is this your own app?
What’s the other API call that you are talking about - if you let me know more about the app, I can have a look to see if I can spot anything?
Yes it’s a web app, The first API call takes the user’s past conversation and summarizes it. It does this when the page is loading. Both the chat and the initial summarization use gpt-3.5-turbo. The api calls look identical but here are the times printed out on my console;
first api call execution time: 21.968876838684082 seconds
chat api call execution time = 3.175050973892212
chat api call execution time = 6.420484781265259
chat api call execution time = 6.449390172958374
It does sound odd indeed!
if you want to post some examples of the prompts, I’m happy to run them here on a different account so you know you’re not going mad - I can let you know the speeds I get back for them?
Have you tried executing them each manually in code or even direct with Curl / Postman?