Simultaneous Requests - API

youri.rd · February 10, 2023, 11:00am

Hi,

My apologies if the answer is somewhere in documentation / other topics as I was unable to find it.

I have fine tuned a curie model for conditional generation. For its use, the input would be text of about 20-40 lines. The results for my use case are significantly better if I do 20 requests of one line and concatenate the answers, rather than one request of 20 lines. As it is per token, the price is similar.

However, I am wondering about limitation of simultaneous requests via the API. Is there any ? Is it unlockable ? How does it work ?

Thanks a lot for the time,

ruby_coder · February 11, 2023, 1:52am

- Fine Tuning

No, you cannot fine-tune like this, I do not think.

I do not think there is any facility in the OpenAI API to parallel process fine-tuning data “simultaneous”; and I’m not even sure it would be possible based on the GPT high-level architecture.

Yes, fine-tuning is slow and it is a great idea to parallel process in a threaded way, fine-tunings, but my guess is that this is currently not possible.

Recall that in the current offering, when you fine-tune a model, the output is another fine-tuned model (with a new id and name). So, if you fine-tune 100s or files in parallel, the result now would be 100s of new fine-tuned models, which of course you don’t want.

- Completions

If you are talking about completions, I see no reason to not do this in parallel. Sounds very reasonable.

Please clarify @youri.rd.

Thanks for a great technical question, BTW.

youri.rd · February 11, 2023, 10:26am

Thanks a lot @ruby_coder for your answer !
I meant for completions not fine-tuning

If I understand correctly you are saying than except the 3k requests/minute and 250k davinci tokens / minute (x25 for curie) there is no limitation for parallel requests. That’s correct ?

If so that’s game changer for my use cases
I was affraid than a fine tuned model can only take care of a new prompt after it’s done with the current !

ruby_coder · February 11, 2023, 11:01am

Did your original question ask about rate-limiting?

My reply did not consider any rate-limiting policy or methods, TBH @youri.rd

My reply was only that there is no reason you cannot parallel process completions on your fine-tuned model.

How you are rate-limited, I have not tested it and so have no basis to reply about rate-limiting when parallel processing with authority.

Sounds fun to test! Why not test it @youri.rd and let us know what you learn?

nishtha · May 25, 2023, 9:58am

@youri.rd were you able to in the end parallelise the requests to OpenAI? If so how or what process did you use to do that?

notmatteo · June 3, 2023, 9:56pm

I’m also trying this, but have it not figured out completely.

Topic		Replies	Views
Parallelise calls to the API - is it possible and how? API	13	47776	December 13, 2023
Simultaneous Request Limitations API	6	3705	December 23, 2023
Queries about generating multiple requests at a time on davinci model and increasing the token limit API	2	2906	December 22, 2023
Gpt-3.5 concurrent requests limit API	3	6104	February 2, 2024
Parallel API Requests - Very Long Response Times API	7	3430	August 18, 2024

Simultaneous Requests - API

- Fine Tuning

- Completions

Related topics