Batching prompts still being recommended despite not

ashdragon · December 16, 2023, 10:10pm

Batching prompts is not supported by the chat completions. There are a number of posts in the community forums complaining about this .
But these two pages, which have been edited since Developer Day, still recommend batching prompts.

https://platform.openai.com/docs/guides/rate-limits/error-mitigation

https://platform.openai.com/docs/guides/production-best-practices/improving-latencies

_j · December 16, 2023, 11:04pm

You have a good discovery of documents that have limited applicability - and show use of a model that will be turned off in weeks. Solution answering no particular problem.

prompts = ["Once upon a time,"] * num_stories
 
# batched example, with 10 story completions per request
response = client.completions.create(
    model="curie",
    prompt=prompts,
    max_tokens=20,
)

The information is useful if the endpoint and non-chat mode is made clear. It demonstrates completion - having the AI continue writing where text is left open.

The example above is still poor, because it relies only on token sampling variations to not make ten near-identical stories that unfold exactly the same due to their independence. A confidently-trained low perplexity AI will have even more uniformity. This is better demonstrated with a list of inputs.

ashdragon · December 16, 2023, 11:21pm

I was looking for ways to speed up API calls and found it. And found out the hard way that it didn’t work for chat completions

8020sllc · January 23, 2024, 6:09pm

bump. recommend request batching capability be added to chat completions api. it seems they want to steer the api towards chat completions vs completions, and batches to ease high request volume, but have not put the two together yet.

Diet · January 23, 2024, 6:13pm

Are you hitting RPM limits, or would it be more of a convenience thing for you?

ashdragon · January 23, 2024, 6:39pm

I implemented my own rate limiting into my code because I was occasionally hitting the RPM limit for GPT-3. 5

But my thought had been for the GPT-4 calls, if I could batch them, I could minimize latency of just waiting for data to move back and forth.
The app I’m working on has 50k-200k tokens of data per run split into 3000-5000 token chunks, and takes about 20 minutes just for the GPT-4 calls. If I could batch, I could probably shave off a minute or two. Not a huge amount, but still something.

Topic		Replies	Views
Batching with ChatCompletion not possible like it was in Completion API	17	22899	December 13, 2023
Batching with ChatCompletion Endpoint Documentation gpt-35-turbo , chat-completion , batching , rate-limit	11	33786	December 13, 2023
Batch requests with Chat Completion API chatgpt	1	3643	November 29, 2023
Without using Batch API , how do users manage making large number of requests to OpenAI API	8	1707	July 18, 2024
We need to send multiple prompts in one request for the chat endpoint API	4	4096	December 17, 2023

Batching prompts still being recommended despite not

Related topics