Batching with ChatCompletion not possible like it was in Completion

Following the March 1st release of ChatGPT API, I’m willing to perform batching like explained in OpenAI API (Example with batching).

As it currently seems there is no option to do so.
I am aiming at sending multiple prompts and receiving multiple answers, 1 for each prompt as if it was on separate conversations.


        {'role': 'user', 'content': 'this is prompt 1'},
        {'role': 'user', 'content': 'this is prompt 2'},

I get back:

  "choices": [
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Sorry, there is no context or information provided for either prompt 1 or prompt 2. Can you please provide more information?",
        "role": "assistant"

Although I would like ChatGPT to look at every message separately and not as part of the same conversation.



HI @yotam.martin

What’s stopping you from making multiple async calls to the chat completion endpoint for respective prompts/conversations?

1 Like

Hi @sps
If we look at the Chat limits (Pay-as-you-go users (after 48 hours)):
•3,500 RPM
•90,000 TPM*

In case of prompts that are shorter than 90,000 / 3,500 it makes sense to batch prompts together to reach the limits cap.


Batching won’t be feasible given how the chat completion endpoint works.

It would still eat away the TPM limit and affect the quality and length of the conversations that have been batched in a single call.


Yes @sps, but OpenAI has made the hilariously bad product decision to steer all of its GPT users towards the ChatCompletion API.

At 1/10th the price, all GPT users should and will be writing their own best attempts at utility classes to get around ChatCompletion’s ugly abstraction. But batching (and n) warrant a better approach than “dispatch many requests asynchronously”


Hi @yotam.martin, Thanks for pointing this out. I’m suffering from the same issue at the moment. And here’s the workaround I discovered, which I hope will be helpful to you. My problem is using text completion to auto-answer hundred thousands of responses (sentences or paragraphs) to a specific code. like, does this response describe a ‘xxx’ or ‘yyy’.

        {'this is prompt 1'},
        {'this is prompt 2'},

to make it work on chapGPT, i adjust to:

        {'role': 'user', 'content': 'here is the background, and what i want to achive'},
        {'role': 'user', 'content': 'here are the xxxreponses list:
       {'role': 'user', 'content': 'Please determine whether each sentence relates to xxx. Your response should take relevant details from the background, the response, and the label. The output should only contain the sentence index number and the short answer yes or no.'}

The output is:

1. Yes
2. No
3. No
4. No
100. No

and that’s what i want! :upside_down_face:


Brilliant! I’ve been running classification task like this one by one. This is very helpful.

Is there really no clear drop-in replacement for batching in ChatCompletion?


I have tried using threading in python. Sometimes it produces the results quickly, other times it stalls indefinitely.

Necro’ing this - looks like batching is not currently supported for chat endpoints? Doesn’t sound like anyone’s figured out a step that we’re missing?

I can’t really imagine a technical barrier to it if using it as a substitute for the original Completion, so seems really weird. Still hoping someone comes up with a clean way to do it.

UPDATE: @yotam.martin here’s how you can do batching with ChatCompletion


@sps This seems like more of a workaround than a solution. Giving ChatGPT a prompt like: “Complete the following list of prompts and reply with a list of outputs” is very different than sending these as independent requests. I expect, for example, that if I sent a batch of stories in this format, each hundreds of words, and asked ChatGPT to complete them all, the stories would bleed together. This doesn’t seem very different from simply including these prompts as separate messages in a chat history.

This is a big deal for researchers like myself trying to study ChatGPT and GPT-4, and I expect for people building applications as well. If different prompts/completions contaminate each other, that affects my analysis of model behavior. For small experiments, I call the completions API with hundreds of unique, but relatively short prompts, and thousands of prompts with bigger experiments. It appears that I must call the ChatCompletions API hundreds or thousands of times, for what takes a single API call and 5-10 seconds in the Completions API. Maybe there are some tricks that I haven’t figured out, but in my initial testing, it looks like ChatCompletions takes longer than that to respond to a single prompt.

A huge +1 from me for batching to be added to the ChatCompletionsAPI.

1 Like

Yes, it absolutely is a workaround. I have mentioned possible limitations as well in the end.

Could you using reliableGPT for this - python package to handle batch calls to openai.

Noting that this was also a painpoint for us. So we made a package to execute the API calls in parallel: parallel-parrot - not the same as batch, but largely gets the same result, without hackery

Noting that reliableGPT has since been deprecated

I have done this using

# define a retry decorator
# from
def retry_with_exponential_backoff(
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 4,
    errors: tuple = (openai.APIError,openai.APIConnectionError,openai.RateLimitError,openai.APIStatusError,), # do not add openai.Timeout!
    """Retry a function with exponential backoff."""
    def wrapper(*args, **kwargs):
        # Initialize variables
        num_retries = 0
        delay = initial_delay
        # Loop until a successful response or max_retries is hit or an exception is raised
        while True:
                return func(*args, **kwargs)
            # Retry on specific errors
            except errors as e:
                # Increment retries
                num_retries += 1
                # Check if max retries has been reached
                if num_retries > max_retries:
                    raise Exception(
                        f"Maximum number of retries ({max_retries}) exceeded."
                # Increment the delay
                delay *= exponential_base * (1 + jitter * random.random())
                print("new delay=",delay)                
                # Sleep for the delay
            # Raise exceptions for any errors not specified
            except Exception as e:
                raise e
    return wrapper

and then

def chat_completion_with_backoff(azure_openai_client = None, **kwargs):

I have a loop that calls chat_completion_with_backoff 50000+ times. I am now running this overnight to see if it catches exceptions and retries properly.

Another thing to try is to save results on a sqllite e.g., and then query if already run (by key), if not run, if run bypass and move on. That way one could just create a csh/tcsh/bash script like this:


… a thousand times