Evaluate confidence on GPT-5 response

rafellevy · October 31, 2025, 8:22pm

Recently, I’ve been working on a project using the OpenAI API with the GPT-4 model, where I used the logprobs function to calculate perplexity and estimate the model’s confidence in its responses. I noticed that for a specific task, the GPT-5 model performed better; however, the new model does not support the logprobs function. Is there any other way to estimate the model’s confidence in its responses without using logprobs?

_j · October 31, 2025, 8:44pm

A replacement for logprobs, and then also sampling controls such as top_p? Not really mathematically; only by further language analysis that is merely speculative if you would achieve success.

Capture the reasoning summaries: have an AI grader with lots of its own prompt and examples determine how much waffling, uncertainty, and deliberation is seen in this rewritten text.
Trials: run several and look for consistency, or departures in the delivered outputs, along with similar analysis if reasoning summary methods or internal choices are also divergent.
gpt-5-pro (only on Responses) is “parallel test time compute”; essentially a token run sequence perplexity chooser with an unseen and uncontrollable amount of extra generation that you have to believe and trust matches the 12x cost.

gpt-5 makes for a poor “judge”. It seems to have a character of knowledge-based intelligence but lack of holistic understanding, and then that every token position is quite randomly-sampled between runs.

Topic		Replies	Views
Logprobs deprecated for gpt-5 models? Deprecations	1	2426	September 2, 2025
Confidence score for prompt response API	11	15916	December 21, 2023
Evaluating the confidence levels of outputs generated by Large Language Models (GPT-4o) Community gpt-4	5	1670	June 8, 2025
Thought/answer pattern while evaluating confidence from logprobs API chatgpt , classification	0	704	May 24, 2024
No probabilities for ChatGPT API responses? API	3	2729	December 17, 2023

Evaluate confidence on GPT-5 response

Related topics