Logits_bias no longer fully working

It used to be that logits_bias of -100 would completely prevent the token from showing up, but now it seems to be more of a suggestion.

An example test prompt is:

{
  "model": "gpt-3.5-turbo",
  "messages": [
        {"role": "system", "content": "Follow the pattern and use brackets [] in your response."},
        {"role": "user", "content": "1: [test1]\n\n2: [test2]\n\n"}
    ],
  "max_tokens": 160,
  "temperature": 1.2,
  "stream": false,
  "logit_bias": {"510":-100}
}

It does bias the output, but it will still output the " [" token about 40% of the time.

Use case: This is problematic in a story-telling text generation where the prompt includes a lot of instructions or information encased in brackets. We’re supposed to generate only prose, but ChatGPT decided to sometimes copy the style of the prompt and include bracketed text, despite that all bracketed tokens (both with and without the preceding space) have been set as logits_bias -100.

Not looked into this yet, but have you tried like 999 or 1000 and not 100? Just wonder if there was a change in magnitude.

A square bracket is not often a single token. It is within token sequences of hundreds of tokens like ( [") (characters within the parenthesis).

You can put the exact output generated into a token encoder and see if the character sequence including the bracket is another unanticipated token, but you just end up swatting flies.

Yes; it doesn’t allow a magnitude lower than -100.

Not in this case, it’s disproven in my simple example I provided in which the bracket is just the token of " [" (if answered exactly in the expected format)

You may be working in the world of extreme certainty. Here, one I constructed for gpt-3.5-turbo-instruct:

image

How about then I throw +/- 0.01% into the logits?

It might be either optimizations that reduce possible logits, or randomness in the outputs that are unobservable (until OpenAI frees the logprobs.)

It is already observed that these new turbo models will produce random top-1 token choice flips even when all attempts at determinism are made. Why not a random normalized probability over 1.00000?

Interesting, thanks for taking a look. I had this repro for longer text where a " [" shouldn’t have anywhere near 100% certainty. But I’m not sure if I’ll be able to get a reproducible example of that, since it might be random based on temperature.

Also, I think the fact that biasing to -100 does influence the output in my “test3” example, illustrates that it’s still a fixable bug (it reduces the chance of generating the correct answer from 100% to around 40-50%)

I’ve moved this to the bugs category as it does seems like this should have at least a much stronger influence than it does.

2 Likes

My thoughts precisely. Just because a substring is in the generated text, doesn’t mean it’s the same token.

@AI-Roguelite can you share the exact generated text?

Here’s an example where similar looking pieces of text have different token ids:

text:

tokens:

I had the same problem when trying to remove double quote marks (") from the response.

As @sps suggests, not only is " a token, but also "a, "b, "c and so on.
I see it’s the same case with brackets
image

Finding and removing these with logit_bias doesn’t seem like the right solution here. In my case i will be running the response through a post processing script instead.

2 Likes

There are different token IDs because the second [ includes the line break.

You may find a package like instructor useful to get the model to constraint the output of the model to a specific data type. It uses pydantic under the hood.

1 Like