Gemini context caching feature


Writing this post just to to ask OpenAI’s people to add the context caching feature that Google is offering now. Is really nice and has potential to save a lot in token charging.

Many of us are doings chats support by RAG, and it shouldn’t be necessary to recalculate context itself attention every time a new chat iteration is submitted.

1 Like

This would be wonderful as it is very expensive to use the api at scale with large contexts and threads.

Maybe they have added it - and you just pay anyway…


As a non-developer, my 2-cent suggestion would be to use the Assistant 2.0 embeddings as a cache. The v2 upgrade stood out to me because of its 10,000 item limit. For instance, one of GPT’s limitations is its lack of memory. By embedding each thread, GPT can effectively “remember” every chat you’ve had with it.

For more details on context management and truncation, refer to the following link: Managing Threads and Messages.