This is advice to help you win some battles, I still think you will lose the war but at least this will keep you on what I consider the best next step.
Look very closely at the o1 examples noted earlier. These were created less than a month ago and touch many of the points noted. However the examples are high level and for your problem should have many more LLM agents with the ones checking the facts based on code that is not using AI to validate the facts.
https://platform.openai.com/docs/guides/prompt-engineering
Also see:
A search for just papers with legal
and agent
in title.
Side note:
While this is not a direct comparison or prediction, it is something to understand for possible quagmires to avoid.
How IBM’s Watson Went From the Future of Health Care to Sold Off for Parts
IBM Resurrects Watson to Ride the AI Hype Train
One of the more successful stories with LLMs is that of generating programming source code (often with bugs) and having the ability to feed back the compilation errors to eventually get correct code. Also the programming language has to be one that the LLMs have had much accurate training with, e.g. JavaScript or Python, and not languages like Prolog for which there is relatively little training data. I note this because compilers provide the feedback of what is valid or not valid. This is often a critical step that is is missing from many who apply LLMs and then fail.