DeepLLM is now out on github at GitHub - ptarau/recursors: Full Automation of Goal-driven LLM Dialog Threads with And-Or Recursors and Refiner Oracles .
You can play with it at https://deepllm.streamlit.app/ and steer an LLM to explore a topic in depth, quickly, truthfully and hallucination free.
I have also made a quick Youtube intro to DeepLLM, showing how it works, see DeepLLM - YouTube.
For the academically inclined, there’s a draft paper about it at: [2306.14077] Full Automation of Goal-driven LLM Dialog Threads with And-Or Recursors and Refiner Oracles .
First, welcome to the forum, were glad to have you and appreciate your original contribution here.
I read through your paper and while it is certainly intriguing, it falls somewhat short of compelling for me due to the lack of quantifiable metrics.
What I would love to see with this is a proposed benchmark by which you could compare your approach to others—some kind of headline numbers which demonstrate the value of your contribution.
I’m any event, after a quick read I think you’ve got something interesting here.
Thanks for the suggestion on metrics, planning for that in the next iteration of the paper.
BTW, finding the right metrics to compare with other work is a bit less obvious than usual as the task covered by DeepLLM is new, but any ideas on what benchmarks might be relevant are welcome.
In the meantime, trying it out at deepllm.streamlit.app, on your favorite technical topic might help finding out quickly if the generated model collects salient information about it.
For sure! That’s the beauty though of being among the first movers—you get to choose/design the benchmark.
One thing you might consider is to identify what you think your absolute best use-cases are, come up with maybe 30–50 exemplars, then compare your results against base GPT-4 results. Ideally you’d include some particularly tough tasks so your approach doesn’t score 100% so you and others have somewhere to go with it.
One would expect base GPT-4 to perform poorly, but that’s the point, right? You want to demonstrate how your approach surfaces new abilities. So you might start there, then maybe look at things like AutoGPT and AgentGPT if you think they’re comparable, or possibly some other prompting techniques.
You’re right when you say it’s not easy, but you presumably decided this approach with an unsolved problem in mind, so I would start there.
If I’ve got any extra time this week I’ll try to give your paper a second read and let you know if anything springs to mind.