Multi-turn conversation best practice

Hi everyone,
i’ve developed a chat bot to answer questions about our technical manuals and datasheets.
I’ve used embeddings to select contex with cosine similarity and then build the prompt with the relevant sections of our manuals.
The result is pretty good.

Now we want to move foreward implementing a multi-turn conversation. If i’ve understood right, the trick is to append in the context of the following question also the previous response. The question is: am i supposed to identify the previous response with some tag in the context, or simply i append together with the other sections of manual as is?

Thank you in advance

Just append the chat history between the system and last user inquiry.

To put simply:

const messages = [
{ role: "system", content: "system prompt..." },
{ role: "assistant", content: "..."},
{ role: "user", content: "..."},
{ role: "assistant", content: "..."},
{ role: "user", content: "user inquiry..."}
1 Like

thank’s understood. So in this case i can’t manage large context but i have to pick only the more relevant sections to avoid overflow tokens limit.
At the moment i was giving all the context i can :slight_smile:

So let me know if i understand right:

1° turn:

const messages = [
{ role: "system", content: "system prompt..." },
{ role: "assistant", content: ""}, // empty
{ role: "user", content: "Question with context"},

2° turn:

const messages = [
{ role: "system", content: "system prompt..." },
{ role: "assistant", content: "1° turn response"},
{ role: "user", content: "Question with context"},
{ role: "assistant", content: ""},//empty
{ role: "user", content: "2° Question with context"}


Is it right?

(can you please remove the “solved” marker for this topic)

looks good to me, essentially you just append. the reply and new question to the end.

import os

import openai
from dotenv import load_dotenv
from flask import Flask, render_template, request

load_dotenv()  # load env vars from .env file
openai.api_key = os.getenv("OPENAI_API_KEY")

app = Flask(__name__)

# Global variable to hold the conversation
conversation = []

def index():
    global conversation
    conversation = []  # Clear the conversation when user starts a new conversation
    return render_template("index.html")

@app.route("/get_response", methods=["GET", "POST"])
def get_response():
    global conversation
    message = request.args.get("message")
    conversation.append({"role": "user", "content": message})
    completion = openai.ChatCompletion.create(
    response = completion["choices"][0]["message"]["content"]
    conversation.append({"role": "assistant", "content": response})
    return response

if __name__ == "__main__":

Hi @Foxabilo,
i reopen this thread 'couse i have some doubts on how to manage the system role.
In our application we build prompt to answer against our context: so in every query we append also the context (retrieved by embeddings and cosine-similarity) on which GPT must search a possible answer.
As suggested on openai faq we include the context in the user role. But we’ve found out on others examples on web (i.e. this one) on which the context must be inserted always in the system role.
So the question is: in case of a multi-turn conversation, every turn produce a new context; where are we supposed to include that context? Have we to append it always in the system role or (as we’ve done till now) we append the new context in user role that provide a questione against that context?

Typically the system role defines your AI’s persona and tasks for the session, you can put important information in there as well. Usually you simply include the context in it’s own “user” roll prompt surrounded by a marker… lets say ##### at the start and finish i.e. ##### this is my context ##### now you add “Using the context in ##### markers please … (the thing you want it to do)”

Yes, it’s exactly what we did but on microsoft faq (that in my understanding has a relevant role in openai :slight_smile: ) they use the system role to manage the context.

{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only answer questions using the context below and if you're not sure of an answer, you can say 'I don't know'.

- Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series.
- Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.
- At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use."
{"role": "user", "content": "What is Azure OpenAI Service?"}

Ahh, I see, are you using the Azure OpenAI API?

No, we are using openai API, but i suppose the way in which format the request to openai engine should be the same: here come the doubt. :thinking:

Indeed, I think they will be using the same system prompt format that OpenAI uses, so… All I can tell you is my experience, with complex requirements like this I like to do a fair bit of R&D just playing with prompt schemes and moving sections to the system prompt and looking for effect. Typically I’ll build a few python scripts to let me try out various styles and spend a day or two just running tests to help get a feel for the “shape” of the potential solution, get a hold of the edges as it were.

1 Like

That just looks to me like its just instructions described as context.
It does not look like they put the past context in system. Just fixed.

“Context” can mean, just to describe in what context should the bot behave.
Or “context” can mean the past history.

From that example, this just looks like context of how the bot should behave or what he should keep in mind, always fixed.

Even if the context is fixed, as you can see from the example it seems that it’s not a description on how the bot must behave but it’s the real context on which it must search an answer.
They clarify it even in the first system sentence:

Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only answer questions using the context below and if you’re not sure of an answer, you can say ‘I don’t know’.

Yes, you can include the info you want the bot to use in the system.
That is basically also part of how the bot should behave.

But after few responses, I dont think they put the history into system. Its just the same…is my point

Plus if you would put the history into system, it might give the actual end-user ability to mess with the bot. Because he could essentially edit the system message.

mainly responding on this:

However personally I prefer to put a fixed info about where the bot is, or what info he should use in the following user message, because if you make the system message complicated, it might listen less to the actual instructions. (talking mainly about 3.5) [plus I have pretty huge “context” about its environment and the info it should use]
Its best to keep system simple, from my experience.

And then just save the history just as it goes, save user prompts as user, save assistant responses as assistant in the history. And limit the number of messages saved.
(if you would be doing summarization, then probably you would just include it in one user message)

I imagine there might be some differences when you save all history only as 1 user message always
(probably starting each prompt/response inside the user message with username: to have it clear who said that)
vs saving each user as user and each assistant as assistant, building up many messages.
I was doing 1 user history with the davinci and then switched to 3.5 and then switched to saving it as each individual message with proper roles and it seemed to work much better for 3.5

I guess there might be some differences that make putting fixed texts that the bot should use, more significant in the system message, than in user. Might make it use it more, or follow it more.
You could put the more important info into system and then some huge less important context into next user message, to make sure that system is not too complex.

But of course, all kinds of ways can work. There are many possible scenarios where different methods might work differently.

Example of ongoing conversation:

Just save it as it was used
You prompt the GPT with user, save it as user in history
You get response from Assistant, save it as assistant in history
Keep adding messages until you reach your limit, then you start removing the oldest ones (except your system, or first user, or any fixed)

Also one thing I found out
If you start the conversation and the bot always starts responding in different ways/styles
You can pre craft the first Assistant message and make it fixed and hidden from end-user, you can just let GPT generate it until you like it or edit it and save it and put it assistant in the code. You do it in the way you want the bot to respond.

(a tip: use GPT4 to pre-craft your first assistant message for GPT3.5)

system - fixed, instructions
user - fixed, any text to keep in mind/use (or you put it in system, your choice)
assistant - fixed, pre crafted assistant message to define the style
and then first actual prompt user

and continue adding messages as conversation goes in the correct roles)

Then that can help direct the way/style it should respond with further
It can be some welcome message, or some acknowledgement of the instructions or just introduction, just anything.
(A welcome message might cause the GPT to never welcome the user though, as it sees it has already done that)

The longer the context (more tokens used/more messages), the bigger the chance that GPT will start forgetting system instructions. It will start defaulting to the OpenAI AI language model responses, if you told it to behave differently in system. (GPT 4 is better at it)

1 Like

The most common practice is to append messages to the end. But with this approach you are limited to the context window. I have created a pip package on github (at /RediatBrook/tezeta) that allows you to go beyond that. Essentially, what it does is create a vectordb in your local filesystem and stores the messages there. Once you go over the max token limit, it fits the most relevant messages to your query within the token limit. Still early in development, but it’s working really well for me.