The cost difference between 4o and 4 is nearly 30x. Sadly, we must use 4 since it is much better at the desired results. We constantly use the same system instructions over and over which would save us a lot money.
To make matters worse, we cannot use the prompt caching since it appears this feature is NOT available.
Can someone confirm if this prompt caching will eventually make its way to model gpt-4???
Shutoff in six months will eventually make its way to gpt-4-32k models. Which is scorched earth, considering there is nothing comparable, except in pricing and wait time once you multiply by reasoning tokens or voice tokens at a premium price.
The biggest improvement would be to find the actual snapshot that was close to release date, instead of the one we get now that hates writing more than 800 tokens.
Input context caching is less of a need, I think, because typical use of the model will not be at least 1024 tokens of repetitive input used again within 5-15 minutes. Or they already do it for the price given.