We have a multi-tenant app built on top Assistants and RAG through tool calls.
One of the assistants was created with model gpt-4.1. A few days back, we created a thread on behalf of a user of Tenant-A. Tool calls were made and based on results from RAG, OpenAI provided some responses. The data from RAG also included citations that refer to specific documents of Tenant-A (not generic names).
A few days later, we created a different thread, with different metadata for Tenant-B. Here we asked a question similar to what we had asked. In this thread, the response received from OpenAI referred to data from Tenant-A. The question is not generic and can only be answered based on RAG results. When we looked at the thread in OpenAI dashboard, it turned out that no tool calls were made. It would still have been okay had the response content been generic, but it clearly referred to citations (specific document names etc.) from Tenant-A’s data.
- Could there be an assistant-level “cache”?
- If threads are isolated and have their own context, how could this have happened?
We created a different assistant and the problem stopped which made us think that some assistant-level cache is present.