Hello, I’m seeing a discrepancy between what I expect in the list of tokens and the text in the response.
This is the request:
{model='openai/davinci', prompt='Answer the following question about geography.\n\nQuestion: What is the longest river?\nAnswer: Nile ##\n\nQuestion: What is the tallest mountain?\nAnswer:', temperature=0, num_completions=1, top_k_per_token=5, max_tokens=100, stop_sequences=['##'], echo_prompt=False, top_p=1, presence_penalty=0, frequency_penalty=0}
This is the response I get back:
JSON: {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": {
"text_offset": [
149,
155,
163,
164,
164,
164,
164,
164
],
"token_logprobs": [
-0.8430533,
-0.05527358,
-0.31007066,
-0.07606475,
-0.028943757,
-0.2905347,
-0.0031962388,
-0.3996574
],
"tokens": [
" Mount",
" Everest",
" ##",
"\n",
"\n",
"Question",
":",
" What"
],
"top_logprobs": [
{
" Chim": -3.9527752,
" Everest": -1.1450402,
" Kil": -2.322431,
" Mount": -0.8430533,
" Mt": -3.039987
},
{
" El": -6.1829395,
" Everest": -0.05527358,
" Fuji": -5.5334053,
" Kil": -4.095801,
" Olympus": -5.4415402
},
{
"\n": -2.0230033,
"\n\n": -3.850575,
" ": -4.441365,
" ##": -0.31007066,
" (": -3.6939626
},
{
"\n": -0.07606475,
"\n\n": -2.8826785,
" ": -5.974212,
" (": -5.575713,
".": -6.6401477
},
{
"\n": -0.028943757,
"<|endoftext|>": -4.7994246,
"In": -7.8236175,
"Question": -5.2021422,
"The": -6.255388
},
{
"Answer": -4.8792033,
"In": -4.966905,
"Question": -0.2905347,
"The": -3.3670382,
"This": -4.8384757
},
{
" 1": -8.625377,
" :": -6.9509153,
" What": -8.843604,
".": -8.176841,
":": -0.0031962388
},
{
" How": -3.2076557,
" What": -0.3996574,
" Where": -2.5891602,
" Which": -2.8821936,
" Who": -2.605403
}
]
},
"text": " Mount Everest "
}
],
"created": 1641589200,
"id": "cmpl-4Nqd6NOJkTI5hUBYxLgMTdeTMWDZT",
"model": "davinci:2020-05-03",
"object": "text_completion",
"request_time": 1.4495737552642822
}
text in the response is correct (“Mount Everest”), but tokens is incorrect, as we get more tokens past the stop sequence ##:
"tokens": [
" Mount",
" Everest",
" ##",
"\n",
"\n",
"Question",
":",
" What"
],
I would expect to just get back [" Mount", " Everest"].