Removing complex words from generation

udm17 · July 12, 2023, 6:52am

Hello Everyone.

How would I remove complex words from my generation like xargs, which has token id [87, 22046] either by using logit_bias (a bit confused on how the dictionary should be structured then) or by using an instruction in the prompt (have been using this but it still creeps in the output generation once in a while and I would really really want to not generate it) ?

Any advice or help is appreciated.

Cheers,
Ud

Foxalabs · July 12, 2023, 7:19am

Hi,

you can use this form to specify a token you don’t want, you can add more to the list, but the 87 looks like it might be pretty common, so stick to the rarer one. Also give it a temp above 0 to give it some wiggle room in creating new replies without that token.

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  logit_bias= {"22046":-100}, 
  temperature= 0.5,
  messages=[{"role": "user", "content": "Do all the things"}]
)

udm17 · July 12, 2023, 7:24am

Lemme check this out and see how it does. I am a bit hesitant to raise the temperature because of how deterministic i want the output to be but some wiggle room can be afforded. Thanks !

_j · July 12, 2023, 7:24am

Actual 100k tokens used by chat models:
“xargs”
[87, 2164]

" xargs"
[865, 2164]

Foxalabs · July 12, 2023, 7:34am

Good point made by @_j there, you seem to have used the OpenAI tokenizer site to generate your token ID, that is not using the latest tokenizer model. TikToken library from OpenAI can correct this for you when using the cl100k model. see :

udm17 · July 12, 2023, 7:49am

Good catch @_j . I’ve made the correction. Any idea though on how to pass [87, 2164] to the logit_bias parameter ?

Foxalabs · July 12, 2023, 7:51am

If you pass just the 2164 one that is going to be the “args” bit I would guess, so without the ability to say args it will not say xargs…

udm17 · July 12, 2023, 7:52am

My test script is just about to finish but the results looks promising, Thanks for the quick and prompt (hehehe) reply folks !

_j · July 12, 2023, 7:53am

What would likely happen is that it produces the x as expected where it is expected, but then the next token is prohibited so it will revert to the next highest weighted token.

After done having fun with logit_bias as a parameter, then you get to make more forceful prompting.

Topic		Replies	Views
Negative prompts for text generation Prompting gpt-35-turbo	4	8558	December 21, 2023
How do I use logit bias to ban phrases and not just individual tokens? API	2	2199	June 16, 2023
Are there plans for min_tokens parameter? Prompting	6	2720	July 6, 2023
JSON Response + logit_bias API	4	867	March 15, 2024
How to restrict the model to only consider a set of possible tokens API	7	2081	December 21, 2023

Removing complex words from generation

Related topics