Parallelise calls to the API - is it possible and how?

I have a task for which I need to call the completions API several thousand times and expect this to take several hours.

I don’t need help with optimising the prompt itself, but to understand what’s the most recommended way to parallelise (if this is allowed) the calls to the API so that the task completes a little faster.

Also, could anyone confirm if there is a limit to the number of concurrent calls? Some information I’ve found on the internet suggests that only up to two concurrent requests are possible.

Any other solution that could help complete the task faster with respect to making the API requests would be appreciated. Thanks!


Hey, welcome to the forum.

There’s no way to send multiple calls at the same time, as far as I know.

There’s also rate limits in place.

If you need to access more frequently, I would reach out to OpenAI chat support and ask…

Hope this helps!

@alexpapageo : Did you find a solution on how to send concurrent/parallel requests to open AI LLM

In python

The doc is always showing

create …

Just use a create wich is asynchronous

So you can use asyncio to make parallel calls

Be aware of quota
Depending on your prompt Len you should use ratelimit library

And keep in mind that you have to divide the official quota by 2 in order to not have errors

And implement retry in case of errors
(1 retry fix 90% of the 1 on 100 errors, 2 retries fix 100%)

1 Like

Hello, it turns out that the OpenAI’s completion call can automatically handle the multiple request and all runs in parallel.

So no need to use python’s native async libraries!

prompt_1 = "This is the prompt - 1 to run separately"
prompt_2 = "This is the prompt - 2 to run separately"
prompt_3 = "This is the prompt - 3 to run separately"

prompts = [prompt_1,  prompt_2, prompt_3]

response = openai.Completion.create(

# Print the responses
for choice in response.choices:
    print(choice.text, end="\n---\n")

:warning: Warning: the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the index field.

That’s it! It all runs in parallel!
:link: Documentation :point_right:t2: OpenAI Platform

That documentation:

Sending in a batch of prompts works exactly the same as a normal API call, except you pass in a list of strings to the prompt parameter instead of a single string.

Completion models take strings.

ChatCompletion model endpoints instead take a list of dictionaries.

A list of a list of dictionaries is not permitted. Trial:

    "role": "system",
    "content": """You are an assistant.""",
    "role": "user",
    "content": "What is the capitol of California?",
    "role": "system",
    "content": """You are an assistant.""",
    "role": "user",
    "content": "What is the capitol of New York?",

= fail:

“API Error: [{‘role’: ‘system’, ‘content’: ‘You are an assistant.’}, {‘role’: ‘user’, ‘content’: ‘What is the capitol of California?’}] is not of type ‘object’ - ‘messages.0’”


Yeah, thanks for letting us know!
It will be so helpful!

I’m following an example from the official cookbook to get some ideas to implement parallelization.