Retrieve context from assistants api using the retrieve method call

I created an assistant using openAI’s assistants API. It works just fine and generates answers based on the files (knowledge base) I uploaded to it and associated the IDs of those files with the assistant when I created the assistant. What I want to get now is the context openAI’s assistant uses when I ask it to generate an answer. So I ask a question right now and the Assistant’s API returns an answer for me. What I want is to also get the context that the assistant used to generate the answer.

The reason I want to do this is to then use that context to benchmark the assistant I created using benchmarking tools like Ragas and Databricks. These benchmarking tools need to have a context to create the benchmarks for several metrics.

The reason I want to benchmark the assistant is to compare it to the assistant that I have created locally.

2 Likes

I have exactly the same problem, and the lack of replies does not bode well. Maybe the feature does not exists ? In that case we could use the same retriever as the assistant, on the same files, and get the context that was probably generated ?

The best way to do this as I have seen so far if you are ussing the Assistants API is to retrieve the FileCitation object from a message as referenced in the docs here (https://platform.openai.com/docs/assistants/deep-dive/message-annotations). Keep in mind however, you will only be able to access the filename from the filecitation object because as of right now, there is no quote field for that object even though it is listed in the docs that I mentioned. Hope that helps!

1 Like

Thank you very much. I will check it out and probably tell you what I have found. While we are on the topic, do you know of any tools/methodologies/ways of comparing RAG systems? I found several benchmarking tools online, and they compare different RAG systems.

I want to duplicate the same tests and find out where the RAG system I created lies in that same benchmarking solution.

So I guess my question is how do I recreate a benchmarking tool, its tests, and its training data, along with the type of testing mechanism used, to create and test my own RAG system, to compare it with the RAG systems that are already built and compared in that benchmark.

Or maybe my whole approach of benchmarking my RAG system with other RAG systems is completely wrong, and there is an easy way to do the comparison. If so, please let me know.