I checked the documentation, but did not see any mention regarding prompt caching with streaming. Is it available?
I’m not sure, I’m only testing caching again tmrw, as I think when I went to test it didn’t work (maybe I was getting something wrong, but I’m not sure). I think one person got it working yesterday. You are more then welcome to test it and let us know how it goes.
This way, if this question happens again we can send them to this topic. Up to you, not sure if someone will be around to answer your question. Sometimes there is, sometimes there isn’t.
Hi, welcome:
Prompt Caching only works with the static parts of your prompt. So if you have some large amount of an initial prompt that’s over the 1024 token minimum, Prompt Caching automatically matches the prefix, and occurs.
Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end.
So, for the completions endpoint, your initial prompts cached—and possibly any additional prompts you use regularly (I’m not sure of that) that it can make a Cache of.
It shouldn’t matter whether you’re using streaming or not.