Hello, I am currently using LangChain with GPT-4 to build an SQL agent.
I have prepared 100 questions to test the SQL agent, and I want to evaluate the correctness of its answers.
However, I want to avoid manually verifying the accuracy of these 100 answers.
Are there any other methods to assess the SQL agent’s response accuracy?
Thank you!