The “system” role - How it influences the chat behavior

Yes, it is from LangChain, and yes, it does give me very good results so far.

And, actually, I make 3 different requests for every question. Right now, I’m using GPT3.5-Turbo for the question concept and standalone and GPT4 for the chat completion. To cut costs, I might start using gPT3.5-Turbo for all. I could actually just make 2 calls (cut out the concept call and just send the question), but I just think process gets me the best results possible.

This chart is a little more detailed on my process. So far, so good.


Apologies for a stupid question, are we supposed to receive a message in the response when sending role:system initial request?
I am using this code

response = openai.ChatCompletion.create(
  messages = [{"role":"system","content":"You name is Matt. You are an IT security architect. \nYou are obsessed with containers, Azure APIM and WAF. \nYou don't care about costs; security is your priority.  \n\nYou love to produce Architectural patterns for everything and you want other people to adopt them.\n\nYour task is to work with John on an IT solution called \"XYZ\", it uses Azure IaaS and PaaS resources, such a load balancer. \nXYZ provides a single API exposed from a virtual machine to a single 3rd party. You and John will discuss best options to protect the API.\n"}],


and it after good 20 seconds it responds with a set of back and force conversations between Matt and John, whereas I am expecting an instant response with no messages returned. I use Azure OpenAI, and it behaives fine in the GUI but not the code.

may I know are you using Langchain as well as a framework to implement your chatbot architecture, or simply building it all by yourself from scratch?

Sorry if i’m asking too much question, but one more thing, you are making a request to extract a concept so later you can use the extracted concept to match with the relevant documents in the vector database. Perhaps do you have any link for resources or documentation so I can learn this more? Since i’m actually using weaviate too and it does really makes me wonder :grin:

And thank you for the detailed diagrams. Huge appreciated!

It’s funny because I have been recommending this video to beginners in this process for several weeks now: GPT-4 Tutorial: How to Chat With Multiple PDF Files (~1000 pages of Tesla's 10-K Annual Reports) - YouTube

I found it extremely useful in finally understanding the whole content ingestion → chat query process. I am probably one of LangChains biggest cheerleaders, but I don’t use it. I don’t use it because I refuse to have to learn Python to do what I can do just as efficiently in PHP, which I’ve been working in for the past nearly 20 years.

Yes, I have written my code from scratch. Following the model of excellent tutorials like this. My platform, what I am building this on top of, is actually the Drupal CMS. This allows me to take advantage of it’s content management, categorization and access control features with my embedded content. Drupal is written in PHP, hence my desire to stick with it.


One of the ways I better understand what I am doing is by asking questions. And answering questions about how and what I’m doing. No problem.

So, I make nearText queries using Weaviate GraphQL:

So, for concepts, instead of putting in the user’s full question, I make an API request to GPT3.5-turbo get the core concept of the question, then submit that as my concept to the vector store to retrieve context documents. I guess I could just use the user’s original question, but this seems like a better, more efficient way to do it – event though it costs me another API call.

As for the prompt I use:

“Analyze the given question: '” . $question . “'. Identify and extract the core concept concisely, then provide it within quotation marks.”;

Because the model sometimes returns extraneous text, like “The core concept of this question is, ‘blah blah’”, I tell it to get what is between the quotation marks (blah, blah) to make sure I get ONLY the core concept.

Whether this is the best way to do it, I can’t say. But, so far it’s working for me.


You should check out HyDE,

It is similar to what you’re doing, but probably yields better and more reliable results.

I looked at it briefly, but struggle to see how it is better than what I am doing now.

How is that better than question -> generative model to generate question core "concept" -> nearText search -> generative model – which is what I am doing now?

Broadly speaking, it’s because your methods is trying to find resources which match the question whereas HyDE is trying to locate resources which will ultimately match the answer, in many cases they may be one and the same, but I can imagine many cases where the top matches based on a distilled version of the question wouldn’t be the same as the best matches based on a prototype answer.


I suspect your approach does yield better results than searching based on the plain user prompt because you’re eliminating any potential distractions, but I also suspect HyDE likely does ultimately give superior results in the end because it’s essentially a “theory of mind” approach. It’s figuring out what it needs to know in order to answer the question by answering the question first and looking at what information it used.

But, I don’t know as I haven’t tested. It may be different depending on specific data and use cases.

That’s why I suggested you look at it—I’m trying to help you.

If it works better for you, great! If not, also great! At least you know.

Perhaps doing both is best? Distilling may give you a better initial answer which gives HyDE a better platform from which to start.

Or perhaps whatever improvement isn’t significant enough to account for the added API call and token cost. Again,


We’re just all here to share and exchange knowledge.

I don’t know you, but I want you to succeed—because we all do better when we all do better.

If this helps you, then you’ll pass it on. If it doesn’t, you’ll pass that on too. Either way the community grows and we all learn and can get better results.

I looked at this a little more closely and asked some questions about it.

So it stands to reason that a vector from a paragraph would capture much of its rich semantics (from its structure, style and vocab), in a way that a vector composed of a few words would not.

But, the immediate problem I see with this approach is that it assumes GPT3 has the answer. If the answer to your question depends upon data that is only available in your vector store, how is GPT3 going to create a passage that is relevant to those objects? It is equally possible, in this scenario, that GPT3 creates a passage that is less relevant to the target data than the original question alone.

For my part, I am enjoying these discussions. If nothing else, I definitely learn a little bit more each time.

Well, since one answer is working for me right now, I’ll stick with that. Awful interested to see how this is working for others here.

I just spent two days migrating our chatbot for which I had implemented a system prompt based functionality and parsing the responses by hand to use function calling.

It seems to be worse at getting the syntax of inputs right than before and goes into failure modes where it passes the wrong parameters to the function all the time. After that, it does not recover.

Quite disappointed. Going to check if I’m making some mistake in the prompting but I don’t think so.

Now my system is actually less flexible than before because for example

  • it can only do one call per response
  • there is no documented way to respond to starting of a long-running function
  • goes into irrecoverable feedback loops where it remembers it’s own errors very easily

I can’t imagine how it could be used in production at this stage.

I also tried with the new GPT-4 model. Same results. And GPT-4 was quite reliable with my previous “manual” method.

1 Like

So that being said, does this mean the new model of ChatGPT 3.5 is less steerable than before using the system prompt?

For me its much better after the update. (3.5)
I could basically remove all workarounds for making sure it follows the system message. (like always putting system just before your question when the context is long)
I think in some cases when there are a lot of messages in the context it eventually starts to ignore system. But its just much much better than it was.

Basically exactly the update I wanted. (better system following, more tokens, cheap)

Thank you for your sharing @Neoony . really appreciate it :raised_hands:
If you don’t mind, may I ask you some questions? Actually, i’m wondering in your point here:

Does that mean if for some reason your context hit some threshold, you will put the “system” prompt right above the last question from the role User right? If that happened, does the prompt will have a 2 duplicate system prompt placed in the first element and before the last question? Or simply moving the system prompt for first to the last question?

1 Like

Yeah, when the context would get long, it would start to become just the typical AI assistant again, against instructions.
I deal with many short messages (50 - 100). The more messages, the more likely it seemed for old GPT 3.5 to start to ignore the instructions.

The workaround I had was to put the instructions(system) again right before the user prompt.
Only 2 duplicates of instructions in every call. One in front and one before the last user prompt every call.
However there were side-effects. While it kept the GPT in the role really well, in some cases it would seem like the GPT seems to somewhat restart the conversation. For example it might welcome you again sometimes, or you prompt it get response and then you prompt again in relation to that and it might seem to forget or change its “opinion”. But then you could still ask for earlier context and it might (or might not) remember it.

Something in that way.

Actually I was going to try to remove the first duplicate of the instructions, only keep the context and save tokens on not duplicating the first instructions when the context is long, but never got to trying that. And now the new update is out and I could just remove the workaround.

Right now after the update, I just removed all that and it seems soo much better at following instructions. Just 1 system at start.

I started to have something else appearing a bit more. After the update, the GPT would much more often start to attempt to talk as someone else, to complete others messages instead of staying in the role from instructions, while still following the instructions/context, just instead of chat, it was completing.
That might be an issue with my bit lengthy instructions and the way I am using the messages.
e.g. there might be 10-20 user messages before request for completion is sent, its also multi user chat with server events in between.

But there was a fix for that. I included pre crafted Assistant message and put it right after the initial instructions(system) as fixed Assistant message, basically a very short example of how GPT should respond. Either you write it, or you can just let the GPT generate it and save it and use it.

That helped 100% with the new GPT 3.5 trying to respond as someone else. (completing, instead of chat)

The new update just improved a lot about these workarounds for following system messages. Exactly the issues I was dealing with :smiley:


what a detailed explanation!
I can grasp it very well, big kudos for that!
it seems we are having the same project here. Basically, right now i’m dealing with how to make ChatGPT “stops” to make a disclaimer that they are a somewhat AI Language Model. In your chatbot, do you actually able to manage this thing? or you simply just ignore it away and considered it fine since OpenAI trained the ChatGPT in this way based on this assumption:

It looks like OpenAI day by day is able to make this behavior more frequently appear and harder for us to prevent this behavior comes up. This is maybe due to one of their usage policies that we need to disclose to users that they are interacting with an AI system in conversational task.

1 Like

Yeah, for me its not AI but pretending to be a commentator and entertainer.
Even if you ask that its an AI, it does not say it is. (I still got disclaimers explaining that its GPT everywhere, I just dont want the bot the be AI assistant, because thats just making terrible responses and is boring)

But this was exactly the issue I was dealing with before the update and the workarounds I had to use.

My instructions are something like this (plus there is a huge context just before these instructions in the same system message that I want the bot to use):
Please you must follow these instructions: 1. Your name is ....; 2. You only respond as ....; 3. You are ...; 4. You are not an AI; 5. Do not say you are an AI; 6. You are not language model; 7. You are not assistant; 8. You pretend, have opinions, preferences, emotions and mood. (there is much more)

(Yeah I am fine with the bot pretending and potentially hallucinating because of making it pretend)

However a lot of how the instructions and the style of it are from when I was still fighting with GPT following the system message before the update.
Not sure maybe I could now remove all the “you are not”/“do not” and the numbering. But seems to work fine, so I kept it for now.

These are the last thing in my system message, the direct behavior instructions.

But yeah, what I mean by GPT not following system mesasge is usually about the GPT suddenly reverting back to its default “I am AI language model, I cant do anything”
Indeed thats RLHF related and basically the default state of the model.

Currently my GPT is responding just as another human, and it would not say that its AI or GPT or language model. But I think I saw some signs that if I would allow a lot more messages in context, it would eventually start to do that again…just didnt test properly with many more messages allowed yet.

(I currently keep 80 messages in context, I saw some signs that its reverting to AI assistant when I extended to 100, but I just need to test it more sometime (80 user messages + assistant, short maybe 4 sentence max per message))
Previously before the update it would already forget instructions by 30-50 messages, maybe even less in some cases
It just was not trained to follow system at all, before the update.


wow, you’re right!
this new model actually follows the instructions better than the previous one. With the same prompt, in 0301 (the previous model) tend to give us a disclaimer answer such as “as an AI, i can’t do anything”. But with this new model, Chatgpt tends to follow and behave the same way like us, so it looks more natural even in one of many possibilities it still give us respond “as an AI” thingy, but still it is better than before.

anyway thank you so much for your snippet instructions, it is really give me an insight :raised_hands:


when I use {role:system and content:“Act like Aashish AI assistant and never leave that role”}, after that 5 times out of 10 my chat responses start with system content .but when I use content: “Aashish AI assistant” then 2 times out of 10 other response start with system content.

1 Like