Simultaneous Requests - API


My apologies if the answer is somewhere in documentation / other topics as I was unable to find it.

I have fine tuned a curie model for conditional generation. For its use, the input would be text of about 20-40 lines. The results for my use case are significantly better if I do 20 requests of one line and concatenate the answers, rather than one request of 20 lines. As it is per token, the price is similar.

However, I am wondering about limitation of simultaneous requests via the API. Is there any ? Is it unlockable ? How does it work ?

Thanks a lot for the time,

- Fine Tuning

No, you cannot fine-tune like this, I do not think.

I do not think there is any facility in the OpenAI API to parallel process fine-tuning data “simultaneous”; and I’m not even sure it would be possible based on the GPT high-level architecture.

Yes, fine-tuning is slow and it is a great idea to parallel process in a threaded way, fine-tunings, but my guess is that this is currently not possible.

Recall that in the current offering, when you fine-tune a model, the output is another fine-tuned model (with a new id and name). So, if you fine-tune 100s or files in parallel, the result now would be 100s of new fine-tuned models, which of course you don’t want.

- Completions

If you are talking about completions, I see no reason to not do this in parallel. Sounds very reasonable.

Please clarify @youri.rd.

Thanks for a great technical question, BTW.


Thanks a lot @ruby_coder for your answer !
I meant for completions not fine-tuning :slightly_smiling_face:

If I understand correctly you are saying than except the 3k requests/minute and 250k davinci tokens / minute (x25 for curie) there is no limitation for parallel requests. That’s correct ?

If so that’s game changer for my use cases :smile:
I was affraid than a fine tuned model can only take care of a new prompt after it’s done with the current !

Did your original question ask about rate-limiting? :slight_smile:

My reply did not consider any rate-limiting policy or methods, TBH @youri.rd

My reply was only that there is no reason you cannot parallel process completions on your fine-tuned model.

How you are rate-limited, I have not tested it and so have no basis to reply about rate-limiting when parallel processing with authority.

Sounds fun to test! Why not test it @youri.rd and let us know what you learn?


@youri.rd were you able to in the end parallelise the requests to OpenAI? If so how or what process did you use to do that?

I’m also trying this, but have it not figured out completely.