Without using Batch API , how do users manage making large number of requests to OpenAI

I was thinking about the need for batch API . Is the example I’m stating below a good use case :

  1. Generate summary for some document -1.
  2. Translate this document.
  3. Classify this image dataset .
  4. Generate summary for another document -2.
  5. Perform sentiment analysis on customer feedback dataset
    ----- so on say 50 tasks which do not need immediate response.

Without using batch API , the user would have to :
1.Send task 1 , then get its response ,
2. Send task 2 , get its response
and so on for each task .

This is very cumbersome for the user to do.
With batch API , the user can put all these requests in a Batch and send them to OpenAI .
Then check the status and retrieve the results whenever the status is completed.

Does this explain a genuine need for Batch API ?

This is not how the batch API works. You cannot depend on generated output in subsequent requests—it’s not sequential—you need to have a large list of requests ready to go at the start of a batch.

Actually my question is that before batch api was introduced , if the users had a bulk of requests with large volume to send to OpenAI . Then what were the ways to do that ?

A queue with maximum N consumers such that they do not exceed the rpm,tpm, and tpd rate limits. queue consumers handle those errors with an exponential backoff retry.

1 Like

there were several attempts early on to implement batching of requests. if you search the forum, you’ll also see users were implementing their own ways to workaround the limits back then.

1 Like

So can we say that the batch API is efficient way of batching non urgent requests ? And before it there were no good ways to do this .

since it is fit for purpose so we can say it is efficient way to do batching non urgent requests. i’m sure in the past there were those who made their own implementation and it worked for them. but it is hard to argue with the batch api’s pricing, 50% off the normal cost.

Thanks . Also , I checked openai’s cookbook had a method for batching for sending multiple requests in parallel .
As in docs :

Batching requests
The OpenAI API has separate limits for requests per minute and tokens per minute.

If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with the smaller models.

Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a list of strings to prompt parameter instead of a single string.

Warning: the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the index field.

1. Example without batching

num_stories = 10
content = "Once upon a time,"

# serial example, with one story completion per request
for _ in range(num_stories):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": content}],
        max_tokens=20,
    )

    # print story
    print(content + response.choices[0].message.content)

Once upon a time,in a small village nestled between rolling green hills, there lived a young girl named Lily. She had
Once upon a time,in a small village nestled in the heart of a lush forest, lived a young girl named Evelyn.
Once upon a time,in a faraway kingdom, there lived a young princess named Aurora. She was known for her kind
Once upon a time,in a faraway kingdom called Enchantia, there lived a young girl named Ella. Ella was
Once upon a time,in a small village nestled among the rolling hills, lived a young woman named Lucy. Lucy was known
Once upon a time,in a small village nestled between rolling hills, there lived a young girl named Ava. Ava was a
Once upon a time,in a faraway kingdom, there lived a wise and just king named Arthur. King Arthur ruled over
Once upon a time,in a small village nestled among towering mountains, lived a young girl named Lily. She was known for
Once upon a time,in a small village nestled in the heart of a lush forest, there lived a young girl named Lily
Once upon a time,in a far-off kingdom, there lived a kind and beloved queen named Isabella. She ruled with

2. Example with batching

num_stories = 10
prompts = ["Once upon a time,"] * num_stories

# batched example, with 10 stories completions per request
response = client.chat.completions.create(
    model="curie",
    prompt=prompts,
    max_tokens=20,
)

# match completions to prompts by index
stories = [""] * len(prompts)
for choice in response.choices:
    stories[choice.index] = prompts[choice.index] + choice.text

# print stories
for story in stories:
    print(story)

Once upon a time, I lived in hope. I convinced myself I knew best, because, naive as it might sound,
Once upon a time, Thierry Henry was invited to have a type of frosty exchange with English fans, in which
Once upon a time, and a long time ago as well, PV was passively cooled because coils cooled by use of metal driving
Once upon a time, there was a land called Texas. It was about the size of Wisconsin. It contained, however,
Once upon a time, there was an old carpenter who had three sons. The locksmith never learned to read or write
Once upon a time, there was a small farming town called Moonridge Village, far West across the great vast plains that lay
Once upon a time, California’s shorelines, lakes, and valleys were host to expanses of untamed wilderness
Once upon a time, she said. It started with a simple question: Why don’t we know any stories?
Once upon a time, when I was a young woman, there was a movie named Wuthering Heights. Stand by alleges
Once upon a time, a very long time I mean, in the year 1713, died a beautiful Duchess called the young

Can you tell me some limitations of this method ?

You have to wait for all stories in the batch to complete in that example.

One could do all async and as tasks get done, other tasks can be introduced (function of tokens, requests etc). Heres’s the skeleton. ymmv

async def main():
    task_queue = []
    entity_list = []

    resource_watcher_task = asyncio.create_task (resource_watcher(task_queue=task_queue, entity_list=entity_list))
    task_queue.append(resource_watcher_task)
    await asyncio.gather(*task_queue)

def run_main():
    asyncio.run(main=main())


if __name__ == "__main__":
    run_main()