'Model still being loaded' error for finetuned models?

asabet · August 24, 2021, 7:37pm

I’m getting the following error when attempting to stream from a fine-tuned model:

{"error": {"message": "That model is still being loaded. Please try again shortly.", "type": "server_error", "param": null, "code": null}}

how do I ‘retry’ or ‘wait’ in this context (javascript)?

boris · August 24, 2021, 7:40pm

Make the same request ~1-2min after the initial request

asabet · August 24, 2021, 7:43pm

Thanks for the quick response . 1-2min is a looong time, is this due to throttling? What kind of model response time can I guarantee a user in this case? I get this error consistently.

boris · August 24, 2021, 8:02pm

This only happens if you haven’t used the model in a while. On subsequent requests, the response time will be as fast as the base models.

asabet · August 24, 2021, 8:06pm

Ahhh k, so it’s cold-starting and I have to keep it warm. How long between requests before the instance goes cold again?

asabet · August 24, 2021, 11:56pm

@boris I’m still having an issue with my finetuned instance. Model response times are very high and seems to go cold almost immediately (model loading error). It seems like the model is being throttled to 1-2 requests/minute for some reason (not live atm).

deanmkn · August 25, 2021, 12:46am

I’ve had all the same issues noted here. After talking to support it seems for now we have to keep them warm by sending a request every 23-24 hour marks, probably goes cold after 24 hours of inactivity. Regarding response times and rate limiting we’re also experiencing this sometimes aswell which is detrimental to the user experience. However, it does state in the docs that fine-tuned models are meant to provide “lower latency requests” so the longer response times don’t seem to be intended.

boris · August 25, 2021, 12:46am

Could you write to support@openai.com with some details regarding specific timing of requests - they should be able to understand what’s happening.

deanmkn · August 25, 2021, 12:49am

A solution for now, as used in most tech stacks, is to employ some form of backoff. If you’re using python for your API requests, try out backoff · PyPI. Then you can apply exponential backoff to the RateLimitError and APIError errors provided in the openai library.

deanmkn · August 25, 2021, 3:58am

The bigger issue here is the reason they have these cold-starts is to save money on running unused containers in Azure, and so as it scales, if everyone starts sending null requests to keep them warm, its gonna put a huge toll on their Azure bill. I assume though that this has been factored in to the increase in pricing for the fine-tuned models. I also don’t think they can raise token prices too much as they are already quite expensive.

I’m just waiting until they release (if ever) GPT-3 as a self-managed option aka a docker image so we can host it ourselves, and I wonder what the tradeoff would be in hosting costs vs using their API. Doubt they will though until it’s not their only competitive advantage. In saying that, AI21 proves that other competitors are close behind, if not already here!

asabet · August 25, 2021, 5:17am

Do finetuned instances auto-scale with demand ie 2 requests/min vs 1000 requests/min?

@deanmkn I think more competition is definitely emerging, though many of them seem to be mostly trailing OpenAI in terms of features and capacity atm. AI21 looks promising, but I have conflict of interest concerns if they’re handling user data and finetuning manually (atm), and putting out competing products (ie wordtune). Their ToS for ‘submission of content’ (section 6.c) is a bit too expansive imo.

More generally, I think OpenAI will be dominant in the short term as training/serving large LMs is capital intensive, though as the hardware/software optimizations improve (ie Cerebras’ new 120 trillion param chips), I imagine API costs will also come down and there’ll be a more level playing field. My guess is other factors would eventually become more important like continually pre-training base models on new data (ie post 2019), sourcing high quality pre-train data, finetune model hubs (ie codex and beyond), etc.

deanmkn · August 25, 2021, 5:36am

Agree with you on the AI21 point. After joining their discord and doing more research, it seems they have very little care for the application of their software and no where near a defined vision about bettering humanity like OpenAI does (About OpenAI). There no “going live” process or anything, its a free-for-all there, which does make me worried.

boris · August 27, 2021, 11:55pm

Yes, fine tuning requests should scale well to any level of traffic.

silvacarl · June 27, 2022, 7:20pm

[organization=sphinx-medical-technologies-inc] Error: That model is still being loaded. Please try again shortly. (HTTP status code: 429)

just occured 6/27/22 at 12:20 PM PST

silvacarl · June 27, 2022, 8:01pm

it seems to have stopped for now

ricky · September 23, 2022, 1:15pm

I have similar problem now, do you guys find a way to solve this problem?

Topic		Replies	Views
Is there any incident with fine-tuning currently? API	7	1530	December 18, 2023
Rate Limit Issue With Fine-Tuned Model API	11	2914	December 18, 2023
Server always results in 429 errors with fine-tuned models API	1	519	September 24, 2022
Issue to call the fine tuned model API	3	773	May 6, 2023
Finetuning Server issues API	7	2076	December 18, 2023

'Model still being loaded' error for finetuned models?

Related topics