Gemini context caching feature

miguelwon · June 21, 2024, 9:39am

Hi,

Writing this post just to to ask OpenAI’s people to add the context caching feature that Google is offering now. Is really nice and has potential to save a lot in token charging.

Many of us are doings chats support by RAG, and it shouldn’t be necessary to recalculate context itself attention every time a new chat iteration is submitted.

ryan4ai · June 21, 2024, 11:41am

This would be wonderful as it is very expensive to use the api at scale with large contexts and threads.

_j · June 21, 2024, 2:48pm

Maybe they have added it - and you just pay anyway…

StoicJester · July 18, 2024, 9:17am

As a non-developer, my 2-cent suggestion would be to use the Assistant 2.0 embeddings as a cache. The v2 upgrade stood out to me because of its 10,000 item limit. For instance, one of GPT’s limitations is its lack of memory. By embedding each thread, GPT can effectively “remember” every chat you’ve had with it.

For more details on context management and truncation, refer to the following link: Managing Threads and Messages.

Topic		Replies	Views
Token efficiency in context injection API gpt-4 , gpt-35-turbo	2	401	July 29, 2024
Is possible OpenAI API caching the conversation? API	4	3567	June 4, 2024
Gemini PRO came with 1 million context Community gpt-4	6	1774	April 9, 2024
Context reuse for shared GPTs and Assistants without additional per-session input token cost GPT builders	3	784	February 16, 2024
Saving API cost in back-and-forth conversational chatbot API	4	1687	December 17, 2023

Gemini context caching feature

Related topics