Evaluations for Assistants (with file_search)

I am looking for a tool to run end to end evaluations on my OpenAI assistants (which use file_search/RAG). Ideally hosted and low/no code.

Two questions:

  1. Will OpenAI’s beta Evals (https://platform.openai.com/docs/guides/evals) eventually support Assistants, and if so, when?
  2. What 3rd party tools exist to do this? (i.e. E2E RAG evals with OpenAI Assistants)

Here’s what I have found so far:

  1. OpenAI’s Evaluations (on-platform, in beta) https://platform.openai.com/docs/guides/evals are nice but don’t connect to Assistants AFAICT
  2. OpenAI’s Evals library has one eval which talks to Assistants. evals/evals/solvers/providers/openai/openai_assistants_solver.py at a32c9826cd7d5d33d60a39b54fb96d1085498d9a · openai/evals · GitHub I could create my own, but would prefer a no/low-code solution.
  3. LangChain, Llama index route.
  4. 3rd Party: Confident, Scorecard, PromptFoo, Vellum, Ragas

Similar Community post: Evaluation Tools for Assistants