How to effectively validate the answers generated by LLMs?

Hello, I am currently using LangChain with GPT-4 to build an SQL agent.
I have prepared 100 questions to test the SQL agent, and I want to evaluate the correctness of its answers.
However, I want to avoid manually verifying the accuracy of these 100 answers.

Are there any other methods to assess the SQL agent’s response accuracy?

Thank you!

It’s possible to create an adversarial agent designed to verify the accuracy of generated SQL queries. This agent would take the user’s message along with the SQL query produced by the system and cross-check it against the database schema to ensure it aligns correctly.