I’m using GPT-4o for image analysis, and my prompt is quite large (approximately 3,000 tokens). Can I cache this prompt and use it in Batch deployment mode rather than sending the same prompt against each task in the input file?
1 Like
Hi @likhitha,
AFAIK, caching isn’t available for batch API requests as of writing this post.
In the batch input file, every line, which must be a valid JSON object request, is considered a standalone request. Hence, each request must be treated as stateless and contain the prompt and context needed for that individual request to work as desired.
1 Like
Additionally, the benefit to you for activating context window caching on the API currently is a 50% discount on input lengths that match a server’s cache of a similar input.
Not any ability to not need to re-send.
The batch API already has a 50% discount on everything.
The clever OpenAI could identify commonality and run optimized batches behind the scenes, but that is not exposed to you.