Simultaneous Requests - API


My apologies if the answer is somewhere in documentation / other topics as I was unable to find it.

I have fine tuned a curie model for conditional generation. For its use, the input would be text of about 20-40 lines. The results for my use case are significantly better if I do 20 requests of one line and concatenate the answers, rather than one request of 20 lines. As it is per token, the price is similar.

However, I am wondering about limitation of simultaneous requests via the API. Is there any ? Is it unlockable ? How does it work ?

Thanks a lot for the time,

- Fine Tuning

No, you cannot fine-tune like this, I do not think.

I do not think there is any facility in the OpenAI API to parallel process fine-tuning data “simultaneous”; and I’m not even sure it would be possible based on the GPT high-level architecture.

Yes, fine-tuning is slow and it is a great idea to parallel process in a threaded way, fine-tunings, but my guess is that this is currently not possible.

Recall that in the current offering, when you fine-tune a model, the output is another fine-tuned model (with a new id and name). So, if you fine-tune 100s or files in parallel, the result now would be 100s of new fine-tuned models, which of course you don’t want.

- Completions

If you are talking about completions, I see no reason to not do this in parallel. Sounds very reasonable.

Please clarify @youri.rd.

Thanks for a great technical question, BTW.


Thanks a lot @ruby_coder for your answer !
I meant for completions not fine-tuning :slightly_smiling_face:

If I understand correctly you are saying than except the 3k requests/minute and 250k davinci tokens / minute (x25 for curie) there is no limitation for parallel requests. That’s correct ?

If so that’s game changer for my use cases :smile:
I was affraid than a fine tuned model can only take care of a new prompt after it’s done with the current !

Did your original question ask about rate-limiting? :slight_smile:

My reply did not consider any rate-limiting policy or methods, TBH @youri.rd

My reply was only that there is no reason you cannot parallel process completions on your fine-tuned model.

How you are rate-limited, I have not tested it and so have no basis to reply about rate-limiting when parallel processing with authority.

Sounds fun to test! Why not test it @youri.rd and let us know what you learn?