Davinci Codex on Vercel Timing out

sagarpatel1025 · December 17, 2021, 9:49pm

I’m working on an app that uses Davinci Codex. I’ve noticed that each API call to https://api.openai.com/v1/engines/davinci-codex/completions takes approx. 30 seconds to 50 seconds.

My app is deployed on Vercel and it uses the Vercel Serverless feature as a middle layer (This is to hide my openai api key from the user)

The issue is that Vercel requests automatically timeout after 10 seconds. This timeout setting cannot be changed is strongly enforced by Vercel. Although my app works locally (with the same latency) but when its deployed, it times out.

Is there a way to reduce the latency to less than 10 seconds?

DutytoDevelop · December 17, 2021, 10:08pm

Hello @sagarpatel1025,

Unfortunately, the reason why Davinci-Codex takes a long time to load is also the reason why it also is better than the Cushman-Codex model in terms of accuracy.

I looked at the documentation on Vercel’s website and there is an Enterprise Plan that does increase serverless execution timeout to 30 seconds, but I’m not sure if you are able to do that.

I wonder if setting up your function call for streaming the data in chunks instead of receiving it all at once would allow you to divide the amount of time it takes to receive the data, thus working around the possibility of hitting the max timeout limit.

Does anyone know if that would work? I haven’t yet implemented streaming capability in my own workflow yet but plan to do so.

sagarpatel1025 · December 17, 2021, 10:19pm

Unfortunately I’m not able to be a part of the Enterprise plan. Even though the request takes 30 seconds, most of them have been slightly more than 30 seconds so even if I had the Enterprise plan the requests would still timeout.

Your “streaming data in chunks” idea sounds nice but I’ve got no idea on how to implement it with the openai’s APIs.

I just thought of another way but it would require me to redeploy my app on another provider. AWS provides lambda functions whose response times out after 15 minutes. Downside is I have redeploy my endpoint in Lambda, the site would still be in Vercel

fvrlak · December 17, 2021, 10:22pm

Maybe something is wrong with prompt because you don’t wanna wait 30-50s for reply.

DutytoDevelop · December 17, 2021, 10:23pm

@fvrlak that can definitely be a factor for sure!

This post may help @sagarpatel1025:

sagarpatel1025 · December 17, 2021, 10:27pm

Well in the Playground the responses are very quick. My responses are accurate to an acceptable degree but I can still try to rewrite my prompt and see if it makes a difference.

sagarpatel1025 · December 17, 2021, 10:28pm

Thanks for the quick response!

DutytoDevelop · December 17, 2021, 11:05pm

Anytime! If you have any difficulties getting it set up, just let us know and we can certainly help! I personally did a lot of head-scratching when first learning AJAX and then trying to implement it correctly into my Django project . The feeling after I understood how it all worked afterwards was quite rewarding, however!

pw · December 18, 2021, 12:46am

Hi! We’re not seeing any service issues. A few questions to help debug:

How many tokens are in your prompt?
How many tokens are you requesting in your sample?
Are you using the best_of parameter?

sagarpatel1025 · December 18, 2021, 2:17am

Hey,

111 tokens in the request prompt.
max_tokens is 1500
I’m not using best_of parameter

I got access to codex a few days ago but I’ve had access to gpt3 for a few months now. I recently added a new payment method since my free tokens expired.

DutytoDevelop · December 18, 2021, 2:20am

Vercel has been having issues in their network it looks like:

Source: https://www.vercel-status.com/

sagarpatel1025 · December 18, 2021, 2:23am

Thanks for checking! But that issue was yesterday. The status shows no issues today and all systems are operating normally

DutytoDevelop · December 18, 2021, 2:31am

I wonder what it could be then. I’m not too familiar with the service, but it sounds like there is definitely something slowing the incoming packets. Service degradation or perhaps a firewall that is scanning the packets to ensure the data is not malicious in any way. That is quite odd though

Edit: I was wrong, I didn’t know max_token, even if the limit isn’t reached, would affect performance to that degree, but I constantly practice gauging what I think max_token should be set to. That’s interesting!

sagarpatel1025 · December 18, 2021, 2:44am

@DutytoDevelop It’s not a network issue. @m-a.schenk is right.
It’s the tokens. I experimented with diff max_tokens size and it definitely seems to perform much faster with smaller token sizes. Although now I’m wondering how I can re-submit after partial completion to fully generate the code.

sagarpatel1025 · December 18, 2021, 3:43am

Not sure if this helps but if you pass "echo": true in the request, the endpoint will return the concatenation of prompt and the completion text as part of your response. This way I won’t have to manually edit the prompt in the request!

Source: OpenAI API

pw · December 19, 2021, 4:57pm

Yep, it’s the max_tokens parameter that’s causing such long request times. The total request time length will be proportional max_tokens, with each additional token adding around ~50 ms for the current version of davinci-codex.

I recommend lowering max_tokens and potentially using cushman-codex.

pappachuck · February 12, 2022, 10:35am

Just make your own custom Cluster and no more problem.

cm · December 9, 2022, 8:38pm

Hey! Thanks for sharing. I’ve been trying this without luck. I am getting “Unauthorized” or sometimes I get “Too many requests” but I’ve hardly done 10 requests. Here’s my code:

var eventSource = new EventSource( "https://api.openai.com/v1/engines/curie/completions/browser_stream", { 
headers: { 
"Content-Type": "application/json", 
Authorization: "Bearer" + OPENAI_API_KEY, 
}, 
	method: "POST", 
	payload: JSON.stringify({ 
	prompt: fileTextToCursor,
	temperature: 0.75,
	top_p: 0.95,
	max_tokens: 3,
	stream: true,
	})
	});

// listen to the event source for the message event
eventSource.onmessage = (event) => {
	const data = JSON.parse(event.data);
	console.log(`Completion: ${data.choices[0].text}`);
	}
eventSource.onerror = (err) => {
	console.error("EventSource failed:", err);
};

adharbertwork · March 23, 2023, 6:11pm

Starting yesterday, I am getting at least 6 or 7 timeouts for every 20 calls. before that I never ran into any timeouts. Is there something going on the servers starting yesterday? Or is this just a large spike in traffic the past couple of days.

I’m calling the v1/completions endpoint using text-davinci-003, normally it would take up to 20 seconds, now I get timeout all the way up to 300 seconds.

Topic		Replies	Views
API calls to davinci text 3 very slow and random speeds for identical prompts API	27	6970	December 25, 2023
OpenAI Why Are The API Calls So Slow? When will it be fixed? API	103	55302	February 19, 2024
Has anyone else noticed significantly more API errors depending on the time of day? (GPT-4) API gpt-4 , api	24	4421	September 9, 2023
GPT4 API very very slow : reaching timeout API gpt-4 , api	30	8460	February 2, 2024
Davinci-text-003 Response Times API	14	3475	December 25, 2023

Davinci Codex on Vercel Timing out

Related topics