I’m searching for Evaluation tools that could help assist in making my OpenAI Assistant better.
I have an Assistant running in production using using the AssistantsAPI but I’d like to create an evaluation sweet for continuous eval of the Assistants responses.
I’ve tried Tonic.AI but looking for other tools or libraries that others may have that work well with their assistants.
Any and all feedback is welcome!