From experience, I’d say that unfortunately it’s not quite that easy
Every new token (or word) that ChatGPT spits out comes with its own probability, you can call it certainty, if you want. Unfortunately, that probability only refers to that word, and isn’t indicative of any factual certainty. What we’d all like is a sort of “groundedness” predictor (how grounded the response is in reality) - but current bleeding edge approaches make the response ~5-10x more expensive.
The percentage match you’re seeing with a lot of search engines works a little differently - there are multiple methods, but for example with LLM based search (embeddings) - you calculate a match angle. 0° is a perfect match, 90° is something completely unrelated. You can map that into percentages, but that doesn’t pertain to LLM outputs/generations - only raw (document) retrievals.
If you find this interesting and want to play with this stuff I do recommend you check out the APIs on platform.openai.com!
Thanks for sharing your thoughts on this. The problem I am addressing is that, at the moment, ChatGPT sometimes gives incorrect answers in an authoritative manner. In other words, with 100% confidence. That’s not only misleading, it can be dangerous.
I gave the percentage-match that search engines use simply as illustration. To address the problem, “groundedness” is fine. But why not just “% Confidence”, “% Reality” or “% Accuracy”?
Oh yeah, I agree with you. What it’s ultimately gonna be called won’t matter - but the probabilities we currently get out don’t mean what some people think they mean.
I’m just saying it’s a hard problem that can’t be solved that easily at the moment. But it probably will be at some point!