Thanks for the quick response . 1-2min is a looong time, is this due to throttling? What kind of model response time can I guarantee a user in this case? I get this error consistently.
@boris I’m still having an issue with my finetuned instance. Model response times are very high and seems to go cold almost immediately (model loading error). It seems like the model is being throttled to 1-2 requests/minute for some reason (not live atm).
I’ve had all the same issues noted here. After talking to support it seems for now we have to keep them warm by sending a request every 23-24 hour marks, probably goes cold after 24 hours of inactivity. Regarding response times and rate limiting we’re also experiencing this sometimes aswell which is detrimental to the user experience. However, it does state in the docs that fine-tuned models are meant to provide “lower latency requests” so the longer response times don’t seem to be intended.
A solution for now, as used in most tech stacks, is to employ some form of backoff. If you’re using python for your API requests, try out backoff · PyPI. Then you can apply exponential backoff to the RateLimitError and APIError errors provided in the openai library.
The bigger issue here is the reason they have these cold-starts is to save money on running unused containers in Azure, and so as it scales, if everyone starts sending null requests to keep them warm, its gonna put a huge toll on their Azure bill. I assume though that this has been factored in to the increase in pricing for the fine-tuned models. I also don’t think they can raise token prices too much as they are already quite expensive.
I’m just waiting until they release (if ever) GPT-3 as a self-managed option aka a docker image so we can host it ourselves, and I wonder what the tradeoff would be in hosting costs vs using their API. Doubt they will though until it’s not their only competitive advantage. In saying that, AI21 proves that other competitors are close behind, if not already here!
Do finetuned instances auto-scale with demand ie 2 requests/min vs 1000 requests/min?
@deanmkn I think more competition is definitely emerging, though many of them seem to be mostly trailing OpenAI in terms of features and capacity atm. AI21 looks promising, but I have conflict of interest concerns if they’re handling user data and finetuning manually (atm), and putting out competing products (ie wordtune). Their ToS for ‘submission of content’ (section 6.c) is a bit too expansive imo.
More generally, I think OpenAI will be dominant in the short term as training/serving large LMs is capital intensive, though as the hardware/software optimizations improve (ie Cerebras’ new 120 trillion param chips), I imagine API costs will also come down and there’ll be a more level playing field. My guess is other factors would eventually become more important like continually pre-training base models on new data (ie post 2019), sourcing high quality pre-train data, finetune model hubs (ie codex and beyond), etc.
Agree with you on the AI21 point. After joining their discord and doing more research, it seems they have very little care for the application of their software and no where near a defined vision about bettering humanity like OpenAI does (About OpenAI). There no “going live” process or anything, its a free-for-all there, which does make me worried.