Significant degradation in API response time when running in GCP Cloud Run

splash-monkey · October 31, 2023, 2:10pm

We are using OpenAI’s chat API (langchain Python module) to implement a document Q&A service. While testing the service locally on my developer machine (Mac Book Pro), the API response times are much smaller (varies from 2 - 5 seconds). However, once deployed as a GCP Cloud Run service, we observe significant degradation in the API response time (varies from 30 sec - 120 sec).

Here are some details regarding the implementation:

Using the class ChatOpenAI from langchain module
Tried both ‘gpt-3.5-turbo-0613’ & ‘gpt-3.5-turbo-16k-0613’ models. Haven’t observed much difference in performance
We have to ask multiple questions to generate the expected results. Each question is being handled as a separate Chat API request. When parallelizing the API queries, the individual API response times worsen.
Using Pinecone for storing the vectorized document data.
The service is running on a single core instance in the cloud, since it uses only multi-threading (no multi-processor support added to the service)

Any suggestions to improve the API performance within the Cloud Service, would be truly appreciated!!!

Note: Not using API proxy

varun_p · October 31, 2023, 4:05pm

@Foxalabs has some great suggestions

splash-monkey · November 1, 2023, 4:24am

The delays we observed were due to h/w configuration issues with GCP. Once we upgraded our service to always have a dedicated CPU allocated, the API issues went away.

prasad.pilla81 · February 22, 2025, 1:39pm

Thank you so much! This saved my day. Had similar issue because I had setup request based pricing and my service was running on websockets, guess cloud run only allocates resources based on HTTP requests

Topic		Replies	Views
Seeking Advises on Optimizing openAI API Calls Feedback api	3	623	December 29, 2025
8-12 Seconds Response Delay with OpenAI API Using Node.js and WhatsApp API API api	3	698	December 29, 2025
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	4	25438	December 29, 2025
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	3080	December 24, 2023
GPT-4 API to slow when you have to work with a 46 second time out API	12	2965	December 29, 2025

Significant degradation in API response time when running in GCP Cloud Run

Related topics