Batch API vs Prompt caching

daniyal.hashmi214 · October 14, 2024, 8:23am

Hi,
I need to make around 500-550 request to openai api. My objective is to minimize the API cost as much as possible.
I have a fixed system prompt of around 436 tokens. And the user prompt would be approx in between 1500-2100 tokens.

What’s the best approach I should follow, because right now cached_tokens feature is not available for Batch API.?

And, does anyone help with any rough math calculation to compare the cost minimization for both cases? Because many pages only mentioned that there’s 50% cost reduction for both of them.

Thanks

_j · October 14, 2024, 9:20am

The number of common starting tokens between request system messages does not meet the 1024 token threshold for context reuse, which would only be a 50% discount on the input that matches when sent within a relatively short timeout window. You would have to have user messages that also have starting content in common.

Therefore the batch API, discounting the entire API request by 50%, would be the price choice, if you can allow many hours before fulfillment.

The calculation there is original cost * 0.5

Topic		Replies	Views
Can Batch api work with prompt caching? API batch-api	4	812	December 6, 2024
Prompt Caching in Batching API API batch-api , cache	2	296	April 6, 2025
Prompt Token Cache Gaming to Save Money? API prompt-caching	1	657	October 18, 2024
Batch API - System Prompt Caching - Is it possilbe to cache system prompt from single batch job and reuse it across multiple batches? API batch-api	2	281	June 11, 2025
Regarding the Issue of Half-Priced Prompt Caching API prompt-caching	5	603	October 25, 2024

Batch API vs Prompt caching

Related topics