Is there an assistant-level cache?

Memory_Leak · April 25, 2025, 5:43pm

We have a multi-tenant app built on top Assistants and RAG through tool calls.

One of the assistants was created with model gpt-4.1. A few days back, we created a thread on behalf of a user of Tenant-A. Tool calls were made and based on results from RAG, OpenAI provided some responses. The data from RAG also included citations that refer to specific documents of Tenant-A (not generic names).

A few days later, we created a different thread, with different metadata for Tenant-B. Here we asked a question similar to what we had asked. In this thread, the response received from OpenAI referred to data from Tenant-A. The question is not generic and can only be answered based on RAG results. When we looked at the thread in OpenAI dashboard, it turned out that no tool calls were made. It would still have been okay had the response content been generic, but it clearly referred to citations (specific document names etc.) from Tenant-A’s data.

Could there be an assistant-level “cache”?
If threads are isolated and have their own context, how could this have happened?

We created a different assistant and the problem stopped which made us think that some assistant-level cache is present.

sps · April 25, 2025, 8:46pm

Welcome to the dev community, @Memory_Leak!

Assistants have a vector store linked if the file_search tool is enabled, where files can be uploaded to use as a knowledge base.

It could be that the file_search tool was enabled on the assistant and relevant files were added to the vector database attached to the assistant that had that information.

Topic		Replies	Views
Do Assistants remember the context of threads? Documentation gpt-4	2	1254	May 17, 2024
The Assistant API responds by including the non-existent file in the annotation Bugs api , assistants-api , assistants-files	4	216	December 3, 2024
Real context sharing by assistant within thread Feedback api	0	415	March 25, 2024
Does the assistant keep history through all threads? API gpt-4 , api	1	772	May 31, 2024
Can assistant and thread share the same vector store in the same run? API threads , assistants-api	4	999	May 9, 2024

Is there an assistant-level cache?

Related topics