GPT-4 API to slow when you have to work with a 46 second time out

Hi
I have been a member and contributor to the community for a while now. Enough to earn some electronic tokens which reward my good behavior and contributions.

I am explaining this because I have also seen many posts about the performance of the API, or lack of performance. And I am not sure if they make any difference as I don’t see any replies.

But because I am a good community citizen I will do as told by the email I got today telling me I had access to GPT-4 via the api— “As an early customer we’d love to hear about your experience. Feel free to share feedback on our community forum.”

** My feedback is that via the API GPT-4 is considerably slower than gpt3-turbo. And GPT-4 via the Chat GPT website. **

Speed by itself is actually not that important to me. But we are making use of the API via Google’s Workspace Add-on architecture. Giving the user ease of use between their Google Document and the API. And any request that takes more than 45 seconds is terminated. Most of our requests that used to work are now failing with this time limit.

*I know it can respond fast as chat.openai.com does. And I don’t in principle have an issue with a rate limit. But as it stands, GPT-4 is close to unusable for us. *

I say ‘close’ as there are asynchronous solutions to overcome the 30-second timeout (which is enforced at 45 seconds.). But they do alter the functionality and are not trivial.

There! Another good community badge was earned for giving feedback, as requested.

Seriously, otherwise, I love it. It is already making a difference to our business. I’d just like it to do more.

:blush:

1 Like

If it helps, I would trade requests per hour for speed. We do not use the API a lot relative, I am sure, to other use cases. But when we do it needs to be fast as I have explained.

Maybe using websockets might help? I mean with a proxy server where you can control timeouts?

You really have to stream to get the “right” experience

When I stream results start coming back almost right away

I wonder if proxying a stream is a way for you to work around google limitations

Thank you for all your suggestions.

The problem with Google app script is that it’s a bit of a walled garden. It restricts a number of things including threading and any asynchronous processing. Plus add-ons have a 30-second processing limit.

Technically there are many solutions. But I’m attempting to keep the existing investment in app script and not rewrite the code.

That said, I do think I have found s solution that will require minimal changes and keep our app script investment intact.

If anyone is interested I’ll explain how once I’m done proving it.

Thank you :pray:

1 Like

Hi Paul, did you work it out yet or are you still testing your solution?

The completed project involves an application that functions as a Google Workspace Add-On, built utilizing Google App Script. The application interacts with a Google Document and sends text requests from the document to the Open AI chat/completions API, followed by updating the document with the received results.

The primary challenges of this project come from the inherent restrictions of Google App Scripts Workspace Add-Ons, which are:

  1. An Add-On is only allowed to run for a maximum duration of 30 seconds (though in practice, there is some leniency, extending up to 45 seconds) before it gets terminated.
  2. Asynchronous processing or threading cannot be implemented.
  3. Our extensive existing investment in App Script prompts us to build the solution within its ecosystem, rather than branching out.

To resolve these issues, the following sequence of operations is implemented:

  1. The Google Add-On publishes a message to a Google Pub/Sub queue.
  2. A Google Cloud Function (coded in Node.js) dequeues this message.
  3. The Cloud Function then calls the same Google Add-On using its WebApp doPost() interface, in a RESTful manner.
  4. Consequently, the original App Script code that couldn’t be executed earlier (due to the 30-second constraint) can now be processed.

To facilitate this operation, a unique piece of text, timestamped, is inserted into the document before queuing the message. Once the response is generated, this unique text is located and replaced in the Google Document with the API response.

There were some other challenges within this to overcome (eg, OAuth requirements) , but now I have a useful mechanism for other challenges that we had. For example, I ca now cache information with async processing which speeds up the users experience.

Curious if you’ve considered building this add-in with PaLM 2. I have created some sidebar apps using Google Apps Script and PaLM that are extremely quick (sub-2500ms) in many cases.

I have been following from a distance Google AI roll-outs. And hope to integrate them eventually. Indeed I predict that some of the features I have created with our Add On will be superseded by new Workspace feature.

I believe this issue is not actually the fault of GPT-4, OpenAI, or Google.

Some APIs, cloud back-ends or back-end environments limit the time to the fixed amounts of the time (we cannot control). So, it was painful when I was facing similar “timeout” issue that you are experiencing now, but not happening from local.

Because some services strongly limit the time (for me, maximum 10 seconds and that made my gpt-3.5-turbo paid edition is started to timeout (because it sometimes took more than 10 seconds), unfortunately, we cannot do anything from this moment.

In my case, our team decided to use our back-end server (On-Premises) to handle all processes, so we moved, we are using gpt-4 now.

Occasionally, the response time exceeds 60 seconds, but the process does not terminate automatically. Furthermore, our team has taken full responsibility for managing the back-end server (This is very good). Consequently, the overall performance has improved significantly.

I would like to know any solution you have found for speed. The responses I get are always excruciatingly slow.

I have nearly finished an app that would be very marketable if GPT4 answered as quickly as ChatGPT, but instead the wait is so slow most users will not be prepared to wait.

The only OpenAI solution to language token generation speed is to move the customer-facing AI to gpt-3.5-turbo.

You can see the recent improvement in completion time of 250 tokens of GPT-4 (top blue) that corresponds almost exactly with the load reduction of “GPT-4 no longer making long outputs” complaints.

image

1 Like