Why don't we have prompt caching on gpt-4?

VFX · November 22, 2024, 6:52am

The cost difference between 4o and 4 is nearly 30x. Sadly, we must use 4 since it is much better at the desired results. We constantly use the same system instructions over and over which would save us a lot money.

To make matters worse, we cannot use the prompt caching since it appears this feature is NOT available.

Can someone confirm if this prompt caching will eventually make its way to model gpt-4???

_j · November 22, 2024, 7:39am

Shutoff in six months will eventually make its way to gpt-4-32k models. Which is scorched earth, considering there is nothing comparable, except in pricing and wait time once you multiply by reasoning tokens or voice tokens at a premium price.

The biggest improvement would be to find the actual snapshot that was close to release date, instead of the one we get now that hates writing more than 800 tokens.

Input context caching is less of a need, I think, because typical use of the model will not be at least 1024 tokens of repetitive input used again within 5-15 minutes. Or they already do it for the price given.

Topic		Replies	Views
Caching representations API	5	7287	July 13, 2023
Prompt caching with multiple agents API	1	875	October 9, 2024
Does prompt caching works on chatgpt-4o-latest? Prompting gpt-4 , chatgpt	5	691	December 10, 2024
Prompt Token Cache Gaming to Save Money? API prompt-caching	1	767	October 18, 2024
Remember inner GPT state of the system prompt (Feature request) API gpt-4 , gpt-35-turbo , api	9	2087	June 21, 2023

Why don't we have prompt caching on gpt-4?

Related topics