Hi @curt.kennedy thanks so much for offering your help here. Unfortunately, this is not working for me.
Here is my code:
MODEL = "davinci"
enc = tiktoken.encoding_for_model(MODEL)
logit_bias_map = {}
prompt = " john tom dan john bob dan john tom will john tom"
for x in prompt.split():
logit_bias_map[str(enc.encode(x)[0])] = 100
completions = openai.Completion.create(
engine=MODEL,
prompt=prompt,
max_tokens=1,
n=1,
stop=None,
temperature=0,
logprobs=5,
echo=True,
logit_bias = logit_bias_map
)
print(completions.choices)
And here is the ouput:
[<OpenAIObject at JSON: {
"finish_reason": "length",
"index": 0,
"logprobs": {
"text_offset": [
0,
4,
8,
12,
17,
21,
25,
30,
34,
39,
44,
48
],
"token_logprobs": [
null,
-11.489183,
-10.18019,
-5.5443854,
-6.5728407,
-3.138067,
-2.0533974,
-1.9761846,
-6.93383,
-1.993683,
-1.1941555,
-1.0868405
],
"tokens": [
"john",
" tom",
" dan",
" john",
" bob",
" dan",
" john",
" tom",
" will",
" john",
" tom",
"will"
],
"top_logprobs": [
null,
{
",": -3.6765778,
"-": -2.8618271,
".": -3.5472288,
"bytes:\\xe2\\x80": -3.9375181,
"s": -3.2035592
},
{
"as": -3.009693,
"ase": -2.301611,
"asi": -2.3654172,
"lin": -1.7576712,
"my": -2.7895849
},
{
"\n": -4.324432,
" and": -3.30854,
" j": -2.8703802,
"forth": -3.821676,
"ley": -3.9747922
},
{
"\n": -3.7143948,
" dan": -3.0179114,
" tom": -2.18535,
"ny": -3.0394833,
"son": -2.2874274
},
{
"\n": -3.0000987,
" bob": -3.4359412,
" dan": -3.138067,
" john": -2.964168,
")": -3.2884574
},
{
"\n": -3.6697154,
" bob": -1.0042298,
" dan": -3.6627877,
" john": -2.0533974,
" tom": -3.092765
},
{
"\n": -3.4838939,
" bob": -1.1732672,
" dan": -3.6677318,
" john": -3.3311195,
" tom": -1.9761846
},
{
"\n": -3.5157578,
" bob": -1.8858293,
" dan": -0.95853364,
" john": -3.0290709,
" tom": -2.9475436
},
{
" be": -2.5264819,
" dan": -3.6445656,
" john": -1.993683,
"iam": -1.3272516,
"y": -4.225769
},
{
" bob": -1.0446652,
" dan": -2.8825402,
" john": -3.5926692,
" tom": -1.1941555,
" will": -2.8907952
},
{
"b": -2.1100414,
"dan": -1.2277858,
"john": -1.8985239,
"tom": -2.3150966,
"will": -1.0868405
}
]
},
"text": "john tom dan john bob dan john tom will john tomwill"
}]
And if we print that logit_bias_map we get one token per each name in the prompt as shown by the 5 tokens for the 5 names:
{'30686': 100, '39532': 100, '25604': 100, '65': 100, '10594': 100}
What doesn’t seem right to me is that the top_logprobs has different tokens other than the 5 I set to 100 (and they are not super low either in their logprobs). What do you think? Seems like something is off, and wondering if there’s a solution path you can help point me to.
Thanks for your help!