Why will gpt-3.5-turbo-instruct no longer support echo=true and logprobs=1?

Yesterday I got this email:

Why the extremely sudden deprecation? Will there be an endpoint which returns log-probabilities of an inputted completion given an inputted prompt?

Maybe also see this tweet by the guy who created GitHub Copilot

1 Like

Probably because it is too useful…

Same reason they thought they would take away the edit button of ChatGPT conversations?

It lets you explore training and quality. For example, something I just ran:

The most preeminent and notable scientist of the 20th century is

I can traverse the tokens and see that I probably phrased it wrong. So I change it

The most preeminent and notable scientist of the 20th century was

The effect is to take the next word “Albert” from 26.1% to 63.2%.

Likewise, you can refine your own prompts on the actual model you would use to get ideal results, for example, in making classifiers and finding out where your word use will agree more with training sequences, or reduce the appearance of unwanted output such as a linefeed instead of a word.

You can also see exactly how trained the AI is on copyrighted text you input:

image

Or, it can also show that those like Silverman who want to sue for use of their works by AI are BS artists with zero evidence:

It may be that a use OpenAI wants to discourage is: getting back echo’d student assignments, along with logprobs to get a AI detection score.

Far too useful, it must go.

I can traverse the tokens and see that I probably phrased it wrong. So I change it

Very cool example!

It may be that a use OpenAI wants to discourage is: getting back echo’d student assignments, along with logprobs to get a AI detection score.

Interesting theory haha. Though, entertaining your example, you’d almost certainly need a student detection score to put the AI detection score in context. (The probability of observing a text given that it’s AI-written is not equal to the probability of a given text being AI-written.) If we had a pool of verified student-written assignments, perhaps there’s a threshold score that works well. Also, it seems like detecting AI written text without logprobs is pretty easy if you have a big pool of student-written assignments.

Another reason to hide for shame: One can track model perplexity. With logprobs, we can daily dump text and see exactly when the model has become more uncertain.