Completion does not use highest probability token

Consider this snippet with prompt Write a tagline for an ice cream shop. . The 5th token as is not among the first 5 tokens, and it’s log probability is -7.67, much lower than words like off, down.

Is this expected? Why is this happening?

      "text": "\n\n\"Cool as Ice Cream!\"",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": {
        "tokens": [
          "\n",
          "\n",
          "\"",
          "Cool",
          " as",
          " Ice",
          " Cream",
          "!\""
        ],
        "token_logprobs": [
          -2.0575926e-05,
          -5.3477528e-05,
          -0.13431258,
          -0.98935884,
          -7.6705894,
          -0.497154,
          -0.8468377,
          -1.4581996
        ],
        "top_logprobs": [
          {
            "\t": -18.95807,
            "\n": -2.0575926e-05,
            "\n\n": -10.814936,
            " ": -14.61425,
            " \"": -17.648237
          },
          {
            "\n": -5.3477528e-05,
            "\"": -9.983872,
            "Cool": -14.375,
            "Fresh": -13.308465,
            "Ice": -13.072524
          },
          {
            "\"": -0.13431258,
            "Come": -5.458464,
            "Cool": -2.7358596,
            "S": -4.550612,
            "T": -4.6802363
          },
          {
            "Ch": -3.0980117,
            "Cool": -0.98935884,
            "S": -2.9843006,
            "T": -2.144246,
            "The": -2.7761133
          },
          {
            " Down": -3.1353922,
            " Treat": -2.6835787,
            " down": -0.97212595,
            " off": -1.007576,
            " treats": -2.723425
          },
          {
            " Ice": -0.497154,
            " a": -4.7133393,
            " can": -5.8361063,
            " ice": -0.9806613,
            " the": -6.979358
          },
          {
            " -": -1.6810225,
            " Cream": -0.8468377,
            "!": -3.7539268,
            ",": -3.6907084,
            ":": -1.2609136
          },
          {
            " -": -0.9550087,
            " at": -2.569239,
            "!": -2.9378119,
            "!\"": -1.4581996,
            ":": -1.7133
          }

Welcome to the forum.

What settings are you using?

Not sure if this is what you’re asking for

openai.api_type = 'azure'
openai.api_version = '2023-05-15'
deployment_name='prefix-text-davinci-003'
response = openai.Completion.create(engine=deployment_name, prompt=start_phrase, max_tokens=10, logprobs=5)

The reason this happens is simply because the percentages you see there are the probability of them being selected. If something has a 0.05% chance of selection, it still can be picked, and if there are 1000 tokens with 0.05%… well then 50% of the time the choice will be one of those from quite low probability.

If you need only top choices as possible candidates, you should use the top-p parameter, and set it to 0.001 if you want to see nothing but the first.

1 Like

Temperature and / or top_p…