Batch API vs Prompt caching

Hi,
I need to make around 500-550 request to openai api. My objective is to minimize the API cost as much as possible.
I have a fixed system prompt of around 436 tokens. And the user prompt would be approx in between 1500-2100 tokens.

What’s the best approach I should follow, because right now cached_tokens feature is not available for Batch API.?

And, does anyone help with any rough math calculation to compare the cost minimization for both cases? Because many pages only mentioned that there’s 50% cost reduction for both of them.

Thanks

The number of common starting tokens between request system messages does not meet the 1024 token threshold for context reuse, which would only be a 50% discount on the input that matches when sent within a relatively short timeout window. You would have to have user messages that also have starting content in common.

Therefore the batch API, discounting the entire API request by 50%, would be the price choice, if you can allow many hours before fulfillment.

The calculation there is original cost * 0.5

1 Like