Can't find this option in OpenAI playground -

Anyone able to tell me where to find this as the documentation seems ‘off’ … or I’m blind :slight_smile:

https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them#h_97085dcebb

I want to generate the probabilities of the next word in a sentence

To see or receive the probabilities, you must use a completion AI model. Chat models do not support this, nor are logprobs available through the chat completion API endpoint by coding.

You can select the right bar’s “mode” and change from chat to complete. The completion model that most answers like ChatGPT (with an even higher price still) is text-davinci-003.

Using the completion endpoint, more options will be exposed, such as the “full spectrum” that colors the output token by its probability, and displays token and sequence probabilities by hover and selection.

A completion model takes a different style of prompt, and some have no training at all and will simply write the next language that might appear after the input you give in the bare text box.

Here is a shortcut to a playground preset ready to answer questions.

To use the logit_bias parameter to encourage or discourage tokens (via the API call), one must find the correct token number, using the correct tokenizer for the model.

This doesn’t have many useful applications, as there are many alternate tokens that seem to be the same word, such as tokens with or without a beginning space, capital words, and even tokens for words that start with a hyphen or underscore. Demoting tokens representing parts of words (like psych-ology) can broadly impact other generation.

Thank you so much … NPS 10/10!!!

One last question if you don’t mind, when I run this sentence through the tokenizer and get output like attached … those are the number into the word dictionary, correct?. In theory I’ve always seem them represent perhaps sequential order … like on a 50,000 word dictionary “a” would be a lower number (index into dictionary) than zebra.

But either I’ve making a mistake on interpreting the screen shot OR I just assume since all the examples were built that way in introduction material. Which is it?

Thanks again!!

While some tokens are just letters or ASCII and are found early by being manually put in the token dictionary, many others are the result of discovery of ways to compress data of the training corpus.

!#A@# ! # A_A-a
[0, 2, 32, 41573, 5145, 1303, 317, 62, 32, 12, 64]

However, the tokenizer link I gave will also allow you to check the tokenizer of recommended models (and not just particular old ones like the Openai one you show), and see the individual tokens marked by color.

cl100k-base:
!#A@# ! # A_A-a
[0, 2, 32, 31, 2, 758, 674, 362, 1596, 7561]

showing us that “-shot” is a different token than " shot".