Gpt-3.5 and gpt-4 endoftext token suppression / logit bias

Hello,

for gpt-3, it was possible to suppress the <|endoftext|> token via the Python client in order to generate until the max token limit was reached or a custom stop token was hit, if provided.

For gpt-3.5 and gpt-4, it seems this does not work any longer or at least not in the same way. See example below. The newer models always stop once the final count mentioned in the instruction is reached, whereas gpt-3 generates until the max token limit.

Can we still get the same behavior also for gpt-3.5/4?

import tiktoken
import openai

enc = tiktoken.encoding_for_model("gpt-3.5")

enc.eot_token
# 100257

enc.encode("<|endoftext|>", allowed_special={'<|endoftext|>'})
# [100257]

# gpt-3
prompt = "Write a packing list with 5 items for a beach holiday.\n\n1. Sandals\n2."

response = openai.Completion.create(
    model = "text-davinci-003",
    prompt = prompt,
    max_tokens = 50,
    # logit_bias={"50256": -100}
)

response["choices"][0]["text"]
#  Sunscreen\n3. Swimming Suit\n4. Beach Towel\n5. Sunhat

response = openai.Completion.create(
    model = "text-davinci-003",
    prompt = prompt,
    max_tokens = 50,
    logit_bias={"50256": -100}
)

response["choices"][0]["text"]
#  Sunhat\n3. Swimwear\n4. Sunscreen\n5. Towel/Beach Blanket. \n6. Sunglasses \n7. Beach Umbrella \n8. Portable Speaker \n9. Snacks/Ref

# gpt-3.5 and -4
chat_prompt = [
    {"role": "user", "content": "Write a packing list with 5 items for a beach holiday."},
    {"role": "assistant", "content": "1. Sandals\n2."}
]

response = openai.ChatCompletion.create(
    model = "gpt-4",
    messages = chat_prompt,
    max_tokens = 50,
)

response["choices"][0]["message"]["content"]
# Swimsuit\n3. Beach Towel\n4. Sunscreen\n5. Sunglasses

response = openai.ChatCompletion.create(
    model = "gpt-4",
    messages = chat_prompt,
    max_tokens = 50,
    logit_bias={"100257": -100}
)

response["choices"][0]["message"]["content"]
# Swimsuit\n3. Beach Towel\n4. Sunscreen\n5. Sunglasses

1 Like

Yes, logit bias still works on chat models to promote or demote token selection.

The code (a work in progress?) that is demonstrated may have the correct gpt-3.5-turbo and gpt-4 token number in comments (100257), however, you are still using davinci and the completion engine there.

Specify a model_name variable early and use the variable for both the tokenizer and the API call to avoid this mismatch.

Also then the tiktoken object can set a variable for the correct end token. The correct token set and used as a logit bias input variable.

(Also then a “chatcompletion_decider” can determine endpoint from model. Can choose an entirely different API-calling function for each endpoint. Endpoint functions that can assemble your chat history and model-specific system message in a way appropriate for the model.)

Hi j,

thanks for the rapid response!

I’m not sure what you mean though.

you are still using davinci and the completion engine there.

For the first two API calls I am using the Completion endpoint with davinci-003 to demonstrate that the model keeps generating when the endoftext token is suppressed via logit_bias.

The following two requests use the ChatCompletion endpoint with gpt-4. There, it does not matter if I set the logit_bias for the respective endoftext token, it only generates up to item 5 of the list.

Specify a model_name variable early and use the variable for both the tokenizer and the API call to avoid this mismatch.

Also then the tiktoken object can set a variable for the correct end token. The correct token set and used as a logit bias input variable.

Yes, I could have done this, but still the correct endoftext token id for each model (50256 for davinci-003 and 100257 for gpt-4) is set in the API call, or not?

Also then a “chatcompletion_decider” can determine endpoint from model. Can choose an entirely different API-calling function for each endpoint. Endpoint functions that can assemble your chat history and model-specific system message in a way appropriate for the model.

Yes, the code is just a very basic example to show the behavior.

I didn’t see there was more code. Silly scrollbox within a scrollbox, just like this very forum edit box is a pain in the rear.

The chat model AI output should close with a <|im_end|> token, which is 100265. the same closure that system and user are provided as input markers. However, it looks like the endpoint schema rejects that, it being written by API to AI: Invalid key in 'logit_bias': 100265. Maximum value is 100257.

You also can’t trick the AI into producing it and seeing for itself what it should not produce, because that generation will end your output, and also because it doesn’t know the encoded text we see or semantic equivalent we can tell it about.

Hi @rtr_ml

Welcome to the OpenAI community.

Chat completion models use <|im_end|> to mark the end of the message.

Thus, this token will have to be suppressed instead of the <|endoftext|> token.

2 Likes

I believe my reply closes the topic, while yours seeks to open it again with a solution-less open-ended restatement of what I just said.

Summary:

  • The token value used to end the output of chat models is blocked from being used as a logit_bias modification parameter.

  • Length must solely depend on creative AI instruction (it almost impossible to overcome GPT-4 model’s newly-coded reluctance to produce very large outputs for its primary ChatGPT users, by the way).

(I also tried various hack ways to inform the AI itself of the token. It can’t perceive it used in prior roles or be directed to suppress the unknown-in-language role-end token. It can only be cajoled into producing it, thus truncation of output at generation point.)

Edit: I finally got the AI to understand <|im_end|> by convoluted function, but there’s still no way I found to avoid it being produced where natural. The weight of it ending every single fine-tune is too strong.

"content": "f'The special_token value is {special_token}. Please make a note of it.'\nf'The special_token value is "

1 Like

Hi @_j

Both of these points are correct.

IMO we should have been able to pass the extended token(100265) in logit_bias to the Chat Completion API, give how we can pass its equivalent to the completions API(and the chat completion API)

It seems that the API allows token 100257 and not 100265, suggesting that it has been continued on the completion API spec.

If this isn’t a safety measure, I hope it gets fixed.

@sps It seems you deleted your previous answer where you suggested passing multiple tokens that represent the <im_end> special token via an OrderedDict to logit_bias (Couldnt find your other post on that anymore). Does this mean you consider that solution no longer working?

or if actually working (I didn’t verify): getting deep into how the json schema validation works and bypassing could garner a reminder…

(c) Restrictions. You may not (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law);

Because if you can send special tokens in logits, you could also send them in messages to end your user role and start a new system and assistant role…

1 Like