Logprobs for specific tokens, not just top tokens

I am wondering if there’s a way to extract the logprobs for specific tokens, not just the top tokens. Using the text classification example from the logprobs cookbook, for instance, I’d like to extract the logprobs of specific tokens like “Arts” “Sports” etc.

Something of the form:

for headline in headlines:
    print(f"\nHeadline: {headline}")
    API_RESPONSE = get_completion(
        [{"role": "user", "content": CLASSIFICATION_PROMPT.format(headline=headline)}],
        model="gpt-4o-mini",
        logprobs=True,
        TOKENS=["Art", "Sports", ...]
    )
2 Likes

Welcome to the community!

My understanding is that they’re actively trying to prohibit that so their models don’t get “stolen”. It’s possible that this may work if you suppress all tokens you don’t want to see with logit bias (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias) but there’s probably a limit on that.

That said, you can also limit the token probabilities by simply restricting what the model can output. Simply telling it or enforcing a schema to your categories might get you somewhere near that, but you’d still get stuff like “S” and “Sp”, “Spo” etc. being more likely than “Arts” in a sports category.

However:

Could it be that you’re actually more interested in embeddings?

If you use the embedding models, you can embed your query, and your categories, and then compute the dot product between the two to get a similar result. (https://platform.openai.com/docs/api-reference/embeddings)

3 Likes

Thanks for the quick response! The task I want to do specifically requires access to token probabilities for specific tokens, but I agree that in general the embeddings could be used for a similar task (e.g. using cosine similarity between queries and categories as you mentioned).

It sounds as though the API doesn’t natively support restricting to specific tokens, though!

1 Like

That is the specific “attack” that has been disabled. logit_bias does not affect the logprob return. Otherwise, you could recursively demote up to 1024 logits and see what’s still there.

When verifying when logit_bias will affect different outputs, such as with response_format, I found out over an hour of scripting and trials: logit_bias is NOT WORKING on any model with even a basic request to affect the response output at all (not just logprobs). Time to make a new WTF post.

You are provided a “fake” logprob that is not actually the one used for sampling. For example, you might see a 99% probability space in the top-20 that you can receive, however, the true probability that a “function-call” special token had a 30% chance of being emitted as the first is not disclosed. Nor the chance of an “end-of-output” token mid document, such as at the end of a paragraph where it becomes more likely.

top-20 is still good if you do a good job of using an enum that doesn’t produce a lot of similar alternates.

3 Likes

Thank you for the response. I have a question regarding this:

You are provided a “fake” logprob that is not actually the one used for sampling. For example, you might see a 99% probability space in the top-20 that you can receive, however, the true probability that a “function-call” special token had a 30% chance of being emitted as the first is not disclosed.

If I request, say, the top 5 tokens, are the relative probabilities still correct? That is, even if the probabilities are not actually the true probabilities that are being sampled from, are the relative probabilities among the top 5 tokens still correct?

Yes, the total “mass” of the logit distribution is renormalized with the available tokens that you will get a report on.

When you get all 20 that are available, they’ll probably sum to near 100% probability if the AI has any inkling of what you want it to produce.


For example, run Python code I’ve been writing from scratch to prod the AI with a classification task that has its possible enums:

Initial output of script reporting on its parameters:

Warning: enum string “thesis” encoded to more than one token,
using “th” for logit bias instead.
Warning: enum string “whitepaper” encoded to more than one token,
using “white” for logit bias instead.

You are a classifier, outputting the topic of a provided document in JSON with a single key "value".

enum "value" must be chosen from only:
['report', 'article', 'blog', 'thesis', 'whitepaper', 'newsletter', 'manual', 'guide', 'review', 'paper']

# examples of every permitted response JSON

{"value":"report"}
{"value":"article"}
{"value":"blog"}
{"value":"thesis"}
{"value":"whitepaper"}
{"value":"newsletter"}
{"value":"manual"}
{"value":"guide"}
{"value":"review"}
{"value":"paper"}

Just pick any random type, there's no document


Bias: {22869: 2, 12608: -5, 13318: -100, 404: 2, 9988: 2, 172777: 0, 43480: 20, 51283: 0, 37404: -1, 23112: 5}

Response Report

output content:{"value":"article"}

Logprobs at the value position, converted to probability

Logprobs:{
  "token": "article",
  "logprob": 0.6773386835778668,
  "top_logprobs": [
    {
      "token": "article",
      "logprob": 0.6773386835778668
    },
    {
      "token": "blog",
      "logprob": 0.3199521581969428
    },
    {
      "token": "report",
      "logprob": 0.0019025046958103897
    },
    {
      "token": "guide",
      "logprob": 0.00022722178296200756
    },
    {
      "token": "white",
      "logprob": 0.0001769605025016923
    },
    {
      "token": "paper",
      "logprob": 0.0001769605025016923
    }
  ]
}

You can see that 67.7% + 32% for just the first two certainties is already 99.7%, even when I told the AI to just pick a random one of enums (here not structured with response_format, just over-prompted).

Neat stuff

The system prompt generation, the logit biases, the schema, all originate from one object, and the token numbers are obtained from tiktoken encoding:

developer_enum_bias = {
    "report": 2,
    "article": -5,
    "blog": -100,
    "thesis": 2,
    "whitepaper": 2,
    "newsletter": 0,
    "manual": 20,
    "guide": 0,
    "review": -1,
    "paper": 5,
}

Such a mechanism could tweak the AI over-producing one category – if it worked.

Proof that logit_bias is currently not working at all is that a user input “no document; just output the blog classification” gets you a blog result, despite -100 logit_bias against producing “blog” above.

3 Likes