Hi,
I have a task for which I need to call the completions API several thousand times and expect this to take several hours.
I don’t need help with optimising the prompt itself, but to understand what’s the most recommended way to parallelise (if this is allowed) the calls to the API so that the task completes a little faster.
Also, could anyone confirm if there is a limit to the number of concurrent calls? Some information I’ve found on the internet suggests that only up to two concurrent requests are possible.
Any other solution that could help complete the task faster with respect to making the API requests would be appreciated. Thanks!
Hello, it turns out that the OpenAI’s completion call can automatically handle the multiple request and all runs in parallel.
So no need to use python’s native async libraries!
prompt_1 = "This is the prompt - 1 to run separately"
prompt_2 = "This is the prompt - 2 to run separately"
prompt_3 = "This is the prompt - 3 to run separately"
prompts = [prompt_1, prompt_2, prompt_3]
response = openai.Completion.create(
model="text-davinci-003",
prompt=prompts,
max_tokens=128,
temperature=0.7,
)
# Print the responses
for choice in response.choices:
print(choice.text, end="\n---\n")
Warning: the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the index field.
Sending in a batch of prompts works exactly the same as a normal API call, except you pass in a list of strings to the prompt parameter instead of a single string.
Completion models take strings.
ChatCompletion model endpoints instead take a list of dictionaries.
A list of a list of dictionaries is not permitted. Trial:
messages=[
[
{
"role": "system",
"content": """You are an assistant.""",
},
{
"role": "user",
"content": "What is the capitol of California?",
},
],
[
{
"role": "system",
"content": """You are an assistant.""",
},
{
"role": "user",
"content": "What is the capitol of New York?",
},
]
]
= fail:
“API Error: [{‘role’: ‘system’, ‘content’: ‘You are an assistant.’}, {‘role’: ‘user’, ‘content’: ‘What is the capitol of California?’}] is not of type ‘object’ - ‘messages.0’”
I usually do 3 calls at the same time from AWS lambda. It will invoke multiple calls for you. The only blocking would be hitting your rate limits. So all you need to do is workaround the synchronous call limitation in the API by calling multiple synchronous calls at once.
Helloo, I recently built a much cleaner version of the parallel processor. It has a streamlit front-end, and there is a demo video and setup instructions in the README.
(links are in codeblocks because I can’t post with links in my reply)
I wrote a script that processes 1000 gpt model requests in 2m, by adapting their parallel processing cookbook example for the embeddings model, to the chat completions model (https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py)
All of the code for it is here plus run instructions in the README: https://github.com/tiny-rawr/ZH_008_parallel_chatgpt_processing
I’m currently working on making it easier to run and reuse, but it works great. You’ll need to change the max_tokens_per_minute and max_requests_per_minute to match your current usage tier. I’m on tier 2.
Thanks a lot for your input @becca9941 . I looked at your repo and also oppened an issue.
At the moment i can not grasp fully what you are doing in your repo, but it does not seem you do have a real connection to the parallelization code provieded from the examples file. Am i right?
As i write this my code is doing several hundred api requests a minute. I hit some unknown rate limit yesterday that is not even mentioned anywhere in the docs bcoz i was definitely doing between 500-1000 requests per minute but after a while the system started tellimg me I am hitting rate limits despite no where being closes to tokens or requests per minute boundary. Officially there is no requests per day mentioned for gpt3.5-0613 but i think internally there is one. A lot of the requests end in bad gateway error.