Evaluate/Verify Extracted Structure Data

rohithar · April 8, 2025, 8:34pm

I use GPT- 4o to extract data from HTML Tables to a JSON list of objects. They are extracted accurately most of the times but for some input files, the values may be shifted to the right or left, due to empty cells used for formatting, alignment or missing values. I manually verified the output so far, but I would like to automate this process. How can I either programmatically or using an LLM verify the output? I am happy to do one api call for extracting the data and one for verifying the data, but I doubt the model can give an accurate response for verification if it failed to extract the data accurately in the first place.
Is it better to verify programmatically or using LLM? Are there other options?

dmitryrichard · April 8, 2025, 9:26pm

you need to code your webcrawling to have “source”

and raw text, then use the metadata to categlogue the entry, hope this helps

samx1 · April 9, 2025, 4:09am

If you have the data already as HTML tables, I would just transform it to JSON in plain old code, without making a call to LLM. This would be cheaper and 100% correct every time. If writing the code is too hard, ask the LLM to write the code for you and tweak it until it works 100% correctly. To verify the LLM output, you’d probably end up doing similar code anyway - with the added problem that the few percent chance of LLM giving wrong json will not go away.

Topic		Replies	Views
How to effectively validate the answers generated by LLMs? Community chatgpt	1	416	November 26, 2024
Validating and measuring quality of AI generated summary of json data Prompting gpt-4	2	2006	August 3, 2024
Best way to convert payroll reports to JSON API gpt-4	6	398	April 5, 2024
How can I measure JSON output accuracy? API	7	1137	August 21, 2024
How to optimize + what do you recommend? Feedback api	1	133	June 13, 2025

Evaluate/Verify Extracted Structure Data

Related topics