HTTP Calls Excessive Delay Waiting for Server Response

dpickut2 · January 9, 2024, 2:14pm

I’m using http calls to the gpt-4 model API from a NextJS app. They work quite well, with one exception: waiting for server response. When I look at the network requests and responses in the browser dev tools, all metrics are in the milliseconds range, but “waiting on server response” is in the 20 to 30 seconds range. The queries involve simple math and algebra questions and total tokens are around 1,000. Anyone have any suggestions? [UPDATE: In reviewing OpenAI online help, I found this regarding rate tiers: “Organizations in higher tiers also get access to lower latency models.”. Since I only started recently testing the API, and have only spent $US 12 so far, that make explain the slow response - I’m in a higher latency tier.]

SomeUser2022 · January 9, 2024, 3:07pm

You could listen for streaming response, instead of waiting for the full completion to be generated

jonah_mytzuchi · January 9, 2024, 3:19pm

May I know what method you are using?
Chat Completion or Assistant?

dpickut2 · January 9, 2024, 3:19pm

I’ll look into that, thanks. Although, the response takes less than a second to download, so it is not a lot of content. Attached is a screenshot of typical response timing.

dpickut2 · January 9, 2024, 3:24pm

I’m using Chat Completion. I get similar results using the gpt-4 and gpt-4-1106 preview models.

jonah_mytzuchi · January 9, 2024, 3:34pm

I suppose streaming might be able to provide you a better experience, at least you know that it’s responding. However, I suppose response time below 1 minute for 1k tokens is quite reasonable.

I just gave both chat completion and assistant application a shot. Both are below 1k for simple interaction like “give me a code snippet to parse csv file in python”.

dpickut2 · January 9, 2024, 3:43pm

Thanks very much for your help. I’ve been developing the app for a few weeks and have done hundreds of similar math/algebra queries as part of testing. In Playground or ChatGPT-4 the LLM responds in seconds. It’s only the API calls that are slow, and the problem has gotten worse in the last few weeks. It suggest to me OpenAI isn’t handling the API requests queue very well, but it doesn’t seem many other people are having a similar problem.

SomeUser2022 · January 9, 2024, 5:00pm

I thihnk, streaming isn’t so much about download speed, but how soon the server begins sending the response. Wait for the full 1000 tokens to be generated, or start receiving after the first ten? If that makes sense.

dpickut2 · January 9, 2024, 5:50pm

Thanks, I’m just using an axios library for async calls. I’ll try the OpenAI library (which supports streaming) and see if that helps. I appreciate your response.

Topic		Replies	Views
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2927	December 24, 2023
GPT-4 API slow response over 60sec API	6	2443	February 16, 2024
ChatGPT API Very Slow at generating Responses API gpt-4 , api	8	5118	December 25, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	20379	November 9, 2023
How to speed up GPT4 generation Feedback gpt-4 , chatgpt , api	10	5848	January 29, 2024

HTTP Calls Excessive Delay Waiting for Server Response

Related topics