Can't find this option in OpenAI playground -

Anyone able to tell me where to find this as the documentation seems ‘off’ … or I’m blind :slight_smile:

https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them#h_97085dcebb

I want to generate the probabilities of the next word in a sentence

To use the logit_bias parameter to encourage or discourage tokens (via the API call), one must find the correct token number, using the correct tokenizer for the model.

This doesn’t have many useful applications, as there are many alternate tokens that seem to be the same word, such as tokens with or without a beginning space, capital words, and even tokens for words that start with a hyphen or underscore. Demoting tokens representing parts of words (like psych-ology) can broadly impact other generation.

Thank you so much … NPS 10/10!!!

One last question if you don’t mind, when I run this sentence through the tokenizer and get output like attached … those are the number into the word dictionary, correct?. In theory I’ve always seem them represent perhaps sequential order … like on a 50,000 word dictionary “a” would be a lower number (index into dictionary) than zebra.

But either I’ve making a mistake on interpreting the screen shot OR I just assume since all the examples were built that way in introduction material. Which is it?

Thanks again!!

While some tokens are just letters or ASCII and are found early by being manually put in the token dictionary, many others are the result of discovery of ways to compress data of the training corpus.

!#A@# ! # A_A-a
[0, 2, 32, 41573, 5145, 1303, 317, 62, 32, 12, 64]

However, the tokenizer link I gave will also allow you to check the tokenizer of recommended models (and not just particular old ones like the Openai one you show), and see the individual tokens marked by color.

cl100k-base:
!#A@# ! # A_A-a
[0, 2, 32, 31, 2, 758, 674, 362, 1596, 7561]

showing us that “-shot” is a different token than " shot".