Validating and measuring quality of AI generated summary of json data

kiran.sharath6046 · September 21, 2023, 8:00am

I am using GPT-4 to create a summary of json data. The json data contains information regarding a site’s users and their activity. The summaries I generated using prompting and few-shot look good. However, there are occasional instances of data omission, mis-reporting and some cases of hallucination. For this reason and other reasons , I am building a validation part that uses LLMs to extract info in json format from the summary and comparing that info to the info in json using another prompt. I tried to look for standard quality metrics , but they seem to be for text-to-text comparison and not json-to-text comparison or metrics that are “unsupervised” in nature.

Looking for answers from people that come across similar need and what their approach to solve this.

udm17 · September 21, 2023, 9:22am

I tried a similar approach a few months back, where with a couple of samples of json and extracted values, I was able to use GPT to extract the key:value pair from NL. However, it was very inconsistent often, the variability in language expressing the value would cause it to miss out on critical key value pairs.

With such tasks, I found that human eval is the best possible metric. Nowdays, as a part of my daily pipeline testing, I compare the outputs against gold standard outputs that I have just to ensure the performance looks upto date. Time-consuming, yes but worth it.

Topic		Replies	Views
How can I measure JSON output accuracy? API	7	677	August 21, 2024
Valid json every time? Prompting	17	12043	January 3, 2024
How to test an API, built on GPT? API	2	2486	April 9, 2024
Managing prompts in production Prompting api , prompt , prompt-engineering	11	4580	January 22, 2025
Need help with prompt for generating actions and question Prompting	16	2849	December 23, 2023

Validating and measuring quality of AI generated summary of json data

Related topics