Role : assistant in client.chat.completions.create

Excerpt from the section:
https://platform.openai.com/docs/guides/text-generation/chat-completions-api


The user messages provide requests or comments for the assistant to respond to. Assistant messages store previous assistant responses, but can also be written by you to give examples of desired behavior.


What does it mean that assistant messages store previous responses ?
In the example above we have written the content of role assistant to help the 2nd user question.

  1. Can we access the previous responses with assistant ?

 {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},

In this example, is there a specific order of role’s that should be specified? I understand the system should be on the top. But, can “user” be given after “assistant” ?

1 Like

Welcome to the community!

If you read on:

Including conversation history is important when user instructions refer to prior messages. In the example above, the user’s final question of “Where was it played?” only makes sense in the context of the prior messages about the World Series of 2020. Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request.

not with chat completions, but you can with the Assistants API: https://platform.openai.com/docs/assistants/overview?context=with-streaming, but this is probably not what you’re asking.

tl;dr: no, you can’t “access” previous messages - you have to include previous messages in your messages array so that the final user query makes sense.

yes, you can mix and match these in the messages array in any order you wish. You don’t even need a system message, if you don’t want to include one.


This all comes down to unfortunate naming.

For your understanding, under the hood, the LLM essentially generates every new word from scratch.

This is oversimplified, but imagine that every time you say a word, we delete you, rebuild you, upload your entire life to your new brain, wait for you to generate the next word, delete you again, and start the whole process over.

The model has no real conception of whether it’s the assistant, or the user. If OpenAI didn’t prevent it through their programming, it should be possible to upload a conversation with a user message cut off somewhere in the middle, and the model would be perfectly happy to assume the user role.

With every new API request, you need to re-upload the entire conversation history in order for the model to generate the next message.

This is also incredibly powerful, because it gives you the ability to edit the model’s past and history, so to speak.


If you want to understand this better, I would urge you to play around with the base completions endpoint https://platform.openai.com/playground/complete

and simulate the conversation. you can do it like this:

This is a conversation between an AI assistant and a user:

system msg to assistant:
"You are a helpful assistant."

user msg:
"Who won the world series in 2020?"

assistant msg:
"The Los Angeles Dodgers won the World Series in 2020."

user msg:
"Where was it

and notice how the model will happily complete the user msg:


the chat completions api just enforces the json message schema. But within that, you have free reign to do what you want. The output, however, will always be in the assistant role, and is essentially guaranteed to never spill into the user role.

hope this helps!

1 Like

Hello Diet,
Thanks for your response. It helps.

I tried the playground/completions and I could see that the “Assistant” completed the question with “held” as you mentioned.
The system says:
Completion models are now considered legacy.

So, I tried this on
https://platform.openai.com/playground?mode=chat

I did NOT see the “held” word shown here. Honestly, I did see that once when for the 1st time I asked the above question you mentioned in your example without “held”. When I clicked on submit, the assistant actually showed “held” - I missed taking a screenshot. But, I’m not able to reproduce it further.

Q1 : On a separate note, seems like Playground is just a way to execute something for which I would have written a python code by calling something like :
client.chat.completions.create(…)
Is my understanding correct ?

Also, for the output shown corresponding to “Assistant” - Is that actually what the API client.chat.completions.create(…) returns ?

Welcome!

I think you misunderstood. @Diet is saying that in the now legacy completions endpoint, the model will happily assume the role of user - this is not the case for chat completions or the assistants API.

As for your question, yes, you are correct - the playground is like a mockup of what you would otherwise create yourself, and yes, the “Assistant” output is what you would typically receive from the API response - though there are caveats to this.

1 Like

I’m suggesting that under the hood, even the chat models are just completion models. Understanding that will likely help you understand how to better understand the capabilities and limitations of the chat models.

kinda like c++ vs javascript.

Yeah, if you look top right, you can even click view code:

2 Likes

Thanks. This is helpful.

Is it expected that for a simple question asked 2 times, I get similar but not same response ?

1st time:

2nd time response:

To find the Fibonacci series up to a value of 10, we can write a Python code snippet to generate the series. Here is the code:

def fibonacci(n):
    fib_series = [0, 1]
    while fib_series[-1] + fib_series[-2] <= n:
        fib_series.append(fib_series[-1] + fib_series[-2])
    return fib_series

fibonacci_series = fibonacci(10)
print(fibonacci_series)

When you run this code, it will output the Fibonacci series up to the value of 10.

===================

Though the response is similar but not 10% accurate.
Is this because, models do not have any memory of whether the same question was asked previously or not ? And that they will compute everything from scratch ?

yep.

there is a certain level of randomness injected into the output - you can control that with the temperature and top_p parameters.

If you go to the completions playground (https://platform.openai.com/playground/complete) , you can toggle probabilities:

this shows you how likely the token was to be predicted. note that all possible token probabilities (several hundred thousand options) all add up to 1.

a higher temperature “flattens” the probability distribution, so that more improbable tokens have a higher chance to be selected. set it to zero (which will be changed on the back end to a very small number) to prevent low probs from rising to the top.

top_p defines how big the bucket is that tokens are selected from. 1 (100%) allows all tokens to be hypothetically selected. 0.1 will only allow one of the top 10% to be selected. if you set it to 0, only the single most likely token will be selected.

if you set temperature or top_p to 0, you’re more likely to get a deterministic result. It’s not guaranteed (probably because of rounding errors or other stuff, maybe cosmic rays, maybe bugs, who knows), but it’s typically very close.

3 Likes