Hi Community,
I am looking for a prompt evaluation tool that can compare different versions of prompts side by side, automate the chat scenarios, and provide options for human annotation and feedback.
I found a few tools that have these functionalities but they didn’t support gpt-4-1106-vision-preview.
Is there any tool that supports all of these functionalities?