Parallelise calls to the API - is it possible and how?

alexpapageo · January 13, 2023, 3:37pm

Hi,
I have a task for which I need to call the completions API several thousand times and expect this to take several hours.

I don’t need help with optimising the prompt itself, but to understand what’s the most recommended way to parallelise (if this is allowed) the calls to the API so that the task completes a little faster.

Also, could anyone confirm if there is a limit to the number of concurrent calls? Some information I’ve found on the internet suggests that only up to two concurrent requests are possible.

Any other solution that could help complete the task faster with respect to making the API requests would be appreciated. Thanks!

PaulBellow · January 13, 2023, 5:32pm

Hey, welcome to the forum.

There’s no way to send multiple calls at the same time, as far as I know.

There’s also rate limits in place.

https://help.openai.com/en/articles/5955598-is-api-usage-subject-to-any-rate-limits

github.com

openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to handle rate limits\n",
    "\n",
    "When you call the OpenAI API repeatedly, you may encounter error messages that say `429: 'Too Many Requests'` or `RateLimitError`. These error messages come from exceeding the API's rate limits.\n",
    "\n",
    "Rate limits are a common practice for APIs, and they're put in place for a few different reasons.\n",
    "\n",
    "- First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.\n",
    "- Second, rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, OpenAI ensures that everyone has an opportunity to use the API without experiencing slowdowns.\n",
    "- Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience for all users.\n",
    "\n",
    "Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation of the API for its users.\n",
    "\n",
    "In this guide, we'll share some tips for avoiding and handling rate limit errors."
   ]

This file has been truncated. show original

If you need to access more frequently, I would reach out to OpenAI chat support and ask…

https://help.openai.com/en/

Hope this helps!

agar.nitin86 · June 21, 2023, 10:44am

@alexpapageo : Did you find a solution on how to send concurrent/parallel requests to open AI LLM

tlunati · June 21, 2023, 12:08pm

In python

The doc is always showing

create …

Just use a create wich is asynchronous

So you can use asyncio to make parallel calls

Be aware of quota
Depending on your prompt Len you should use ratelimit library

And keep in mind that you have to divide the official quota by 2 in order to not have errors

And implement retry in case of errors
(1 retry fix 90% of the 1 on 100 errors, 2 retries fix 100%)

aayush_shah · September 4, 2023, 10:52am

Hello, it turns out that the OpenAI’s completion call can automatically handle the multiple request and all runs in parallel.

So no need to use python’s native async libraries!

prompt_1 = "This is the prompt - 1 to run separately"
prompt_2 = "This is the prompt - 2 to run separately"
prompt_3 = "This is the prompt - 3 to run separately"

prompts = [prompt_1,  prompt_2, prompt_3]

response = openai.Completion.create(
  model="text-davinci-003",
  prompt=prompts,
  max_tokens=128,
  temperature=0.7,
)

# Print the responses
for choice in response.choices:
    print(choice.text, end="\n---\n")

Warning: the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the index field.

That’s it! It all runs in parallel!
Documentation https://platform.openai.com/docs/guides/rate-limits/batching-requests

_j · September 4, 2023, 11:35am

That documentation:

Sending in a batch of prompts works exactly the same as a normal API call, except you pass in a list of strings to the prompt parameter instead of a single string.

Completion models take strings.

ChatCompletion model endpoints instead take a list of dictionaries.

A list of a list of dictionaries is not permitted. Trial:

messages=[
[
    {
    "role": "system",
    "content": """You are an assistant.""",
    },
    {
    "role": "user",
    "content": "What is the capitol of California?",
    },
],
[
    {
    "role": "system",
    "content": """You are an assistant.""",
    },
    {
    "role": "user",
    "content": "What is the capitol of New York?",
    },
]
]

= fail:

“API Error: [{‘role’: ‘system’, ‘content’: ‘You are an assistant.’}, {‘role’: ‘user’, ‘content’: ‘What is the capitol of California?’}] is not of type ‘object’ - ‘messages.0’”

aayush_shah · September 4, 2023, 11:57am

Yeah, thanks for letting us know!
It will be so helpful!

brunouy · September 6, 2023, 1:16pm

I’m following an example from the official cookbook to get some ideas to implement parallelization.

Reference: https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py

bakaburg1 · November 5, 2023, 3:59pm

So multiple prompting with the ChatCompletion API is officially not supported? it would speed some applications quite a bit

curt.kennedy · November 5, 2023, 4:18pm

I usually do 3 calls at the same time from AWS lambda. It will invoke multiple calls for you. The only blocking would be hitting your rate limits. So all you need to do is workaround the synchronous call limitation in the API by calling multiple synchronous calls at once.

becca9941 · November 13, 2023, 1:06am

Edit: Mon 22nd Apr 2024

Helloo, I recently built a much cleaner version of the parallel processor. It has a streamlit front-end, and there is a demo video and setup instructions in the README.

You can find it here: GitHub - tiny-rawr/bulk-gpt-task-parallel-processor: Bulk write thousands of profiles based on scraped data in under 2 minutes.

Hope that helps!

(links are in codeblocks because I can’t post with links in my reply)

I wrote a script that processes 1000 gpt model requests in 2m, by adapting their parallel processing cookbook example for the embeddings model, to the chat completions model (https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py)

All of the code for it is here plus run instructions in the README: https://github.com/tiny-rawr/ZH_008_parallel_chatgpt_processing

I’m currently working on making it easier to run and reuse, but it works great. You’ll need to change the max_tokens_per_minute and max_requests_per_minute to match your current usage tier. I’m on tier 2.

milappert · November 19, 2023, 1:29pm

Thanks a lot for your input @becca9941 . I looked at your repo and also oppened an issue.

At the moment i can not grasp fully what you are doing in your repo, but it does not seem you do have a real connection to the parallelization code provieded from the examples file. Am i right?

alihammad · November 19, 2023, 2:10pm

As i write this my code is doing several hundred api requests a minute. I hit some unknown rate limit yesterday that is not even mentioned anywhere in the docs bcoz i was definitely doing between 500-1000 requests per minute but after a while the system started tellimg me I am hitting rate limits despite no where being closes to tokens or requests per minute boundary. Officially there is no requests per day mentioned for gpt3.5-0613 but i think internally there is one. A lot of the requests end in bad gateway error.

Topic		Replies	Views
Simultaneous Requests - API API	5	5106	June 3, 2023
Hi I want to pass multiple prompts in ChatCompletion create API Feedback gpt-4 , text-davinci-003	16	8722	March 31, 2024
Batching with ChatCompletion Endpoint Documentation gpt-35-turbo , chat-completion , batching , rate-limit	11	33595	December 13, 2023
Batching with ChatCompletion not possible like it was in Completion API	17	22787	December 13, 2023
Without using Batch API , how do users manage making large number of requests to OpenAI API	8	1512	July 18, 2024

Parallelise calls to the API - is it possible and how?

Related topics