Is there any limit on creating threads parallelly and execute?

Is there any limit on creating threads parallelly and execute? any issues /performance considerations for creating multiple threads and run parallelly

Threads might cause an issue on your local machine where up to a thousand should work depending on your hardware though.

But I guess that’s not what you wanted to know. I mean you got rate limits which are different depending on which tier level you are - you can look that up here

https://platform.openai.com/settings/organization/limits

Let’s say you start 1000 parallel threads and after 500 you reach any rate limit (whether it is combined token per minute or request per minute) the remaining requests will return an error

And for the following I don’t have the current knowledge, it might have changed, but you were charged even for requests that returned such an error (429) - and tbh it is your fault.

To avoid that you can start each process like this:

Calculate token of your request/prompt/messages e.g. with tiktoken
then set a maxtoken value for the expected maximum response length - you can’t force the model (easily) to use the full amount
and then sum up both and add that to a data storage / database.

Then for each new process before you insert the token into the data store you must check if you hit the rate limit of your tier.

And only then start the process or else shedule it for next minute execution…

You should store the rate limit somewhere in your data storage/base and update it when you get to a higher tier.

Rate limits per tier can be looked up here

https://platform.openai.com/docs/guides/rate-limits/usage-tiers