Hi,
I need to make around 500-550 request to openai api. My objective is to minimize the API cost as much as possible.
I have a fixed system prompt of around 436 tokens. And the user prompt would be approx in between 1500-2100 tokens.
What’s the best approach I should follow, because right now cached_tokens
feature is not available for Batch API.?
And, does anyone help with any rough math calculation to compare the cost minimization for both cases? Because many pages only mentioned that there’s 50% cost reduction for both of them.
Thanks