No way to get simple next token prediction?

i110 · May 2, 2024, 2:44am

I may be daft, but is there really no programmatic way to get the next most likely token following a given input?

For example, if I input "Once upon a ", I want it to output “time”, not “I’m sorry, could you finish your prompt?”

I understand ChatGPT could be “instructed” with natural language to do this, but the goal is to get under the whole chatbot layer to get the real result.

Diet · May 2, 2024, 3:03am

Welcome to the community!

Were you maybe looking for the completions/instruct models? Those are the ones that do what you’re asking

wclayf · May 2, 2024, 4:02am

OpenAI probably doesn’t expose this kind direct access because they have that whole “safety layer” that there’s no way to remove. So you’ll probably only be able to just use a prompt that describes what you’re trying to do.

_j · May 2, 2024, 4:12am

Here’s a fun little utility to do just that.

import openai
import math

prompt = "My favorite word is \""
print("START: sending this text to find out how it is completed by AI:\n" +
      "-"*20 + f"\n{prompt}\n" + "-"*20)
while prompt not in ["EXIT"]:
    # Initialize the OpenAI client
    client = openai.Client()

    try:
        # Request completion with raw response from the API
        response = client.completions.with_raw_response.create(
            model="gpt-3.5-turbo-instruct",
            prompt=prompt,
            max_tokens=20,
            top_p=1e-9,
            logprobs=15
        )
        # Parse the choice and retrieve log probabilities
        choice0 = response.parse().model_dump()['choices'][0]
        logprob = choice0['logprobs']['top_logprobs'][0]
        print("\n=== top first token probabilities ===\n")
        # Iterate over each token and its log probability
        for token, log_prob in logprob.items():
            # Convert log probability to normal probability and format as percentage
            probability = math.exp(log_prob) * 100
            # Escape non-printable characters or represent them as byte literals
            escaped_token = ''.join([f"\\x{ord(c):02x}" if not c.isprintable() else c for c in token])
            # Print token and its probability in formatted output
            print(f"'{escaped_token}': {probability:.2f}%")
        print(f"\n=== What the AI wanted to write ===\n\n{choice0['text']}\n")
        prompt = input("Another prompt? (or just EXIT to finish)\n>>>").strip() or input_value

    except Exception as e:
        print('ERROR:', e)

Example response:

START: sending this text to find out how it is completed by AI:
--------------------
My favorite word is "
--------------------

=== top first token probabilities ===

'ser': 98.40%
'Ser': 0.94%
'S': 0.08%
' ser': 0.06%
's': 0.04%
'eff': 0.04%
'ec': 0.02%
'har': 0.02%
'equ': 0.02%
' Ser': 0.02%
'res': 0.02%
'peace': 0.02%
'bl': 0.02%
'love': 0.02%
'gr': 0.02%

=== What the AI wanted to write ===

serendipity." It means a fortunate or unexpected discovery or occurrence. I love the way it

Another prompt? (or just EXIT to finish)
>>>

The -instruct AI has some training on input questions. A complete sentence that can be answered will be seen as a question, generating newlines before an answer.

If instead you use davinci-002 or babbage-002, you get ungrounded language completion.

If you want to see what a chatbot would say, a full -instruct chatbot could start:

prompt = """
Here is a conversation between ChatGPT, a GPT-3.5 AI, and a user:

user: How would you rate Star Wars from 1-10, in a JSON please.
ChatGPT: {"topic":"Star Wars", "rating": "
""".strip()

Topic		Replies	Views
The Completions API doesn't really return completions API prompt-engineering	10	281	August 26, 2024
GPT4 base model for only completion Prompting gpt-4	5	1694	November 15, 2023
Achieving Text Completion with GPT-3.5 or GPT-4: Best Practices (Using Azure Deployment)? Prompting gpt-4 , gpt-35-turbo , playground , text-davinci-003	4	2817	August 29, 2023
How to get just one response in a completion LLM, like OpenAI's chat vs completion API Prompting prompting , prompts-stop-words	3	20503	August 15, 2023
Is it possible to do text completion with GPT-4o? API	2	767	June 13, 2024

No way to get simple next token prediction?

Related topics