My apologies if the answer is somewhere in documentation / other topics as I was unable to find it.
I have fine tuned a curie model for conditional generation. For its use, the input would be text of about 20-40 lines. The results for my use case are significantly better if I do 20 requests of one line and concatenate the answers, rather than one request of 20 lines. As it is per token, the price is similar.
However, I am wondering about limitation of simultaneous requests via the API. Is there any ? Is it unlockable ? How does it work ?
No, you cannot fine-tune like this, I do not think.
I do not think there is any facility in the OpenAI API to parallel process fine-tuning data “simultaneous”; and I’m not even sure it would be possible based on the GPT high-level architecture.
Yes, fine-tuning is slow and it is a great idea to parallel process in a threaded way, fine-tunings, but my guess is that this is currently not possible.
Recall that in the current offering, when you fine-tune a model, the output is another fine-tuned model (with a new id and name). So, if you fine-tune 100s or files in parallel, the result now would be 100s of new fine-tuned models, which of course you don’t want.
- Completions
If you are talking about completions, I see no reason to not do this in parallel. Sounds very reasonable.
Thanks a lot @ruby_coder for your answer !
I meant for completions not fine-tuning
If I understand correctly you are saying than except the 3k requests/minute and 250k davinci tokens / minute (x25 for curie) there is no limitation for parallel requests. That’s correct ?
If so that’s game changer for my use cases
I was affraid than a fine tuned model can only take care of a new prompt after it’s done with the current !