Confidence score for prompt response

jwatte · August 29, 2023, 4:09pm

Alright, I tried the actual API, and it does return the list of logprobs for each token in both the input and output.

Thus, you could use that information to calculate the “probability” of this particular sequence being chosen, and using that as some form of “confidence.” But, in general, I wouldn’t put too much stock in that value, because it will be poorly behaved – for longer outputs, the probabilities will multiply out to lower overall probability, and if the model picks one low-probability token (which could be something unimportant like “and” instead of “the”) the overall probability will multiply out to much lower.

Anyway – if what you want is “confidence in the overall answer” then you can’t really construct that from “random probability of each word fragment token.”
This actually gives a pretty neat insight into why I think these models aren’t really “thinking,” too – they just predict, one token after the next, with no “overall” model of what they’re doing.

Topic		Replies	Views
Gpt-4o-mini response evaluation Community gpt-4 , rag , evals	3	197	February 17, 2025
Thought/answer pattern while evaluating confidence from logprobs API chatgpt , classification	0	529	May 24, 2024
How can I get the log probs of the a ChatCompletions object? API	2	1153	December 17, 2023
The Relationship between Best of, Temperature and Top P (The Three Variable Problem) Prompting	10	4679	April 14, 2023
Non-deterministic probabilities for first generated token in chat.completion? API	4	851	April 24, 2024

Confidence score for prompt response

Related topics