Can't find this option in OpenAI playground -

paulzikopoulos · September 9, 2023, 6:25am

Anyone able to tell me where to find this as the documentation seems ‘off’ … or I’m blind

https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them#h_97085dcebb

I want to generate the probabilities of the next word in a sentence

_j · September 9, 2023, 7:03am

To use the logit_bias parameter to encourage or discourage tokens (via the API call), one must find the correct token number, using the correct tokenizer for the model.

This doesn’t have many useful applications, as there are many alternate tokens that seem to be the same word, such as tokens with or without a beginning space, capital words, and even tokens for words that start with a hyphen or underscore. Demoting tokens representing parts of words (like psych-ology) can broadly impact other generation.

paulzikopoulos · September 9, 2023, 3:30pm

Thank you so much … NPS 10/10!!!

One last question if you don’t mind, when I run this sentence through the tokenizer and get output like attached … those are the number into the word dictionary, correct?. In theory I’ve always seem them represent perhaps sequential order … like on a 50,000 word dictionary “a” would be a lower number (index into dictionary) than zebra.

But either I’ve making a mistake on interpreting the screen shot OR I just assume since all the examples were built that way in introduction material. Which is it?

Thanks again!!

_j · September 9, 2023, 5:12pm

While some tokens are just letters or ASCII and are found early by being manually put in the token dictionary, many others are the result of discovery of ways to compress data of the training corpus.

!#A@# ! # A_A-a
[0, 2, 32, 41573, 5145, 1303, 317, 62, 32, 12, 64]

However, the tokenizer link I gave will also allow you to check the tokenizer of recommended models (and not just particular old ones like the Openai one you show), and see the individual tokens marked by color.

cl100k-base:
!#A@# ! # A_A-a
[0, 2, 32, 31, 2, 758, 674, 362, 1596, 7561]

showing us that “-shot” is a different token than " shot".

Topic		Replies	Views
Logprob value is unbounded, i used sigmoid (converting API logarithm to probabilities) API	5	1586	November 12, 2023
Chat Token counts inconsistency between playground platform and tiktokenizer API chatgpt , token	2	635	December 27, 2024
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	26513	December 13, 2023
Do I need to increase `max_tokens` when using `n>1` e.g. `n=3` for generating multiple chat completions API	8	2022	July 2, 2023
Need help with prompt: "Can you generate 1000 random tokens? " Prompting gpt-4	19	3224	May 15, 2023

Can't find this option in OpenAI playground -

Related topics