Intro
Ever since OpenAI introduced the model gpt-3.5-turbo
, aka ChatGPT, to the OpenAI API on the Chat Completions endpoint, there has been an effort to replicate “batching” from existing users of the completions endpoint migrating to ChatCompletions - owing to the economical pricing.
In the scope of this tutorial, we refer to combining multiple completion requests irrespective of the contexts into a single API call as batching.
Why use batching?
Instead of explaining this, I’ll quote from OpenAI docs:
The OpenAI API has separate limits for requests per minute and tokens per minute.
If you’re hitting the limit on requests per minute , but have available capacity on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with our smaller models.
Sending in a batch of prompts works exactly the same as a normal API call, except you pass in a list of strings to the prompt parameter instead of a single string.
What’s the catch?
The above technique works great with completions endpoint, however when it comes to chat completion endpoint, this technique doesn’t work, because the chat completions endpoint doesn’t take an array of prompts, it takes an array of messages.
Here’s how an array of messages looks like:
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
The Solution
We have to somehow pass multiple prompts to the chat completions endpoint in the message object.
For this purpose I chose a string array.
But we cannot pass a string array to the messages, nor can we pass it to the content attribute of the message object.
So we stringify the array of strings containing the prompts and pass it to the message object with role set to user
.
Now that we have the prompts, we need to tell the model what to do with these.
This is done using system
role message which tells the model to complete individual elements of the array and return them as an array.
Note:
The system message must be appended at the end of message array. It tried to use it in the beginning of the array of message objects, and it didn’t reply consistently.
Code:
Here’s a basic python code to send batch requests to chat completion endpoint and get the completed array in the response.
import openai
import json
openai.api_key = "OPENA_API_KEY" # supply your API key however you choose
promptsArray = ["Hello world, from", "How are you B", "I am fine. W", "The fifth planet from the Sun is "]
stringifiedPromptsArray = json.dumps(promptsArray)
print(promptsArray)
prompts = [
{
"role": "user",
"content": stringifiedPromptsArray
}
]
batchInstruction = {
"role":
"system",
"content":
"Complete every element of the array. Reply with an array of all completions."
}
prompts.append(batchInstruction)
print("ChatGPT: ")
stringifiedBatchCompletion = openai.ChatCompletion.create(model="gpt-3.5-turbo",
messages=prompts,
max_tokens=1000)
batchCompletion = json.loads(stringifiedBatchCompletion.choices[0].message.content)
print(batchCompletion)
Continued in comments.