Validating GenAI responses

Hello OpenAI community!

This is my first post so try to be gentle with me :blush:

I am a QA Engineer working for a company which is integrating with ChatGPT to implement AI-driven features on our products.

One example of the features we are working on is : Summarization.

Our users will press the ‘Summarize’ button, and the AI will summarize the information for them.

The thing is, I am not sure what is the best way to validate the responses returned by chatGPT.

After a bit of research I came up an idea that I think should work.
I will create a baseline of “good” responses which I have chosen manually, and take the responses returned by the AI and compare the two.

The main question is, what is the best way to compare both summarizations automatically? In a way that it will not only compare how close the words are, but also their meaning.

I came up with 2 ideas :

  1. calculate the vectors of both strings, and calculate the euclidean distance between them.
  2. Take both summarizations and send them with a prompt back to chatGPT and ask him to compare them and rate them based on similarity on a scale of 1-10.

Both ways seem like they could bring good results, but I was wondering if anyone here has ever done this before and maybe both of my ideas are actually wrong?

Thank you so much for the read.