I’m using GPT-4o for image analysis, and my prompt is quite large (approximately 3,000 tokens). Can I cache this prompt and use it in Batch deployment mode rather than sending the same prompt against each task in the input file?
Hi @likhitha,
AFAIK, caching isn’t available for batch API requests as of writing this post.
In the batch input file, every line, which must be a valid JSON object request, is considered a standalone request. Hence, each request must be treated as stateless and contain the prompt and context needed for that individual request to work as desired.
Additionally, the benefit to you for activating context window caching on the API currently is a 50% discount on input lengths that match a server’s cache of a similar input.
Not any ability to not need to re-send.
The batch API already has a 50% discount on everything.
The clever OpenAI could identify commonality and run optimized batches behind the scenes, but that is not exposed to you.