is there any reason why did you used screenshots instead of text to easily copy paste thing? and second reason to cut whole screenshot to not be even possible to see everything?
I had this exact problem and went here to check for solutions, adding the system + the first user message with the content of system was still giving me the same results.
What was a game changer and worked for me is adding the system at the start and at the end of my array of messages. So i’m adding system twice, at the beginning and at the end. This is working surprisingly well for me now
Hi @danielmarimiranda , i’m getting interested with your advice. If you don’t mind would you give little bit detail about how you format the message array for each API request.
I would assume this is the format you would send:
I was just going to make the same comment. I maintain a “chat history” where every user question and assistant answer is maintained and sent back to model in every standalone question API request. The “system” message is first in this chat history. It is also first in every completion prompt (along with standalone question and context docs).
Some people send the chat history in every API request, but I found I only needed to send it for the standalone question (hence the point of a standalone question in the first place).
So basically what you doing there is hitting the ChatGPT API with 2 different request for every new question from the user, CMIIW. One is for making a standalone question and then another is for produce the results/response to the users. If I recall this, it’s look like I have ever read this implementation somewhere in langchain maybe but I never try to use this flow. From your chatbot use case, does this flow diagram gives you a really good result? I might have to try it actually
Yes, it is from LangChain, and yes, it does give me very good results so far.
And, actually, I make 3 different requests for every question. Right now, I’m using GPT3.5-Turbo for the question concept and standalone and GPT4 for the chat completion. To cut costs, I might start using gPT3.5-Turbo for all. I could actually just make 2 calls (cut out the concept call and just send the question), but I just think process gets me the best results possible.
This chart is a little more detailed on my process. So far, so good.
Apologies for a stupid question, are we supposed to receive a message in the response when sending role:system initial request?
I am using this code
response = openai.ChatCompletion.create(
engine="gpt-4-32k",
messages = [{"role":"system","content":"You name is Matt. You are an IT security architect. \nYou are obsessed with containers, Azure APIM and WAF. \nYou don't care about costs; security is your priority. \n\nYou love to produce Architectural patterns for everything and you want other people to adopt them.\n\nYour task is to work with John on an IT solution called \"XYZ\", it uses Azure IaaS and PaaS resources, such a load balancer. \nXYZ provides a single API exposed from a virtual machine to a single 3rd party. You and John will discuss best options to protect the API.\n"}],
temperature=0.7,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(response["choices"][0]["message"]["content"])
and it after good 20 seconds it responds with a set of back and force conversations between Matt and John, whereas I am expecting an instant response with no messages returned. I use Azure OpenAI, and it behaives fine in the GUI but not the code.
may I know are you using Langchain as well as a framework to implement your chatbot architecture, or simply building it all by yourself from scratch?
Sorry if i’m asking too much question, but one more thing, you are making a request to extract a concept so later you can use the extracted concept to match with the relevant documents in the vector database. Perhaps do you have any link for resources or documentation so I can learn this more? Since i’m actually using weaviate too and it does really makes me wonder
And thank you for the detailed diagrams. Huge appreciated!
I found it extremely useful in finally understanding the whole content ingestion → chat query process. I am probably one of LangChains biggest cheerleaders, but I don’t use it. I don’t use it because I refuse to have to learn Python to do what I can do just as efficiently in PHP, which I’ve been working in for the past nearly 20 years.
Yes, I have written my code from scratch. Following the model of excellent tutorials like this. My platform, what I am building this on top of, is actually the Drupal CMS. This allows me to take advantage of it’s content management, categorization and access control features with my embedded content. Drupal is written in PHP, hence my desire to stick with it.
One of the ways I better understand what I am doing is by asking questions. And answering questions about how and what I’m doing. No problem.
So, I make nearText queries using Weaviate GraphQL:
So, for concepts, instead of putting in the user’s full question, I make an API request to GPT3.5-turbo get the core concept of the question, then submit that as my concept to the vector store to retrieve context documents. I guess I could just use the user’s original question, but this seems like a better, more efficient way to do it – event though it costs me another API call.
As for the prompt I use:
“Analyze the given question: '” . $question . “'. Identify and extract the core concept concisely, then provide it within quotation marks.”;
Because the model sometimes returns extraneous text, like “The core concept of this question is, ‘blah blah’”, I tell it to get what is between the quotation marks (blah, blah) to make sure I get ONLY the core concept.
Whether this is the best way to do it, I can’t say. But, so far it’s working for me.
I looked at it briefly, but struggle to see how it is better than what I am doing now.
How is that better than question -> generative model to generate question core "concept" -> nearText search -> generative model – which is what I am doing now?
Broadly speaking, it’s because your methods is trying to find resources which match the question whereas HyDE is trying to locate resources which will ultimately match the answer, in many cases they may be one and the same, but I can imagine many cases where the top matches based on a distilled version of the question wouldn’t be the same as the best matches based on a prototype answer.
¯\_(ツ)_/¯
I suspect your approach does yield better results than searching based on the plain user prompt because you’re eliminating any potential distractions, but I also suspect HyDE likely does ultimately give superior results in the end because it’s essentially a “theory of mind” approach. It’s figuring out what it needs to know in order to answer the question by answering the question first and looking at what information it used.
But, I don’t know as I haven’t tested. It may be different depending on specific data and use cases.
That’s why I suggested you look at it—I’m trying to help you.
If it works better for you, great! If not, also great! At least you know.
Perhaps doing both is best? Distilling may give you a better initial answer which gives HyDE a better platform from which to start.
Or perhaps whatever improvement isn’t significant enough to account for the added API call and token cost. Again,
¯\_(ツ)_/¯
We’re just all here to share and exchange knowledge.
I don’t know you, but I want you to succeed—because we all do better when we all do better.
If this helps you, then you’ll pass it on. If it doesn’t, you’ll pass that on too. Either way the community grows and we all learn and can get better results.
So it stands to reason that a vector from a paragraph would capture much of its rich semantics (from its structure, style and vocab), in a way that a vector composed of a few words would not.
But, the immediate problem I see with this approach is that it assumes GPT3 has the answer. If the answer to your question depends upon data that is only available in your vector store, how is GPT3 going to create a passage that is relevant to those objects? It is equally possible, in this scenario, that GPT3 creates a passage that is less relevant to the target data than the original question alone.
For my part, I am enjoying these discussions. If nothing else, I definitely learn a little bit more each time.
I just spent two days migrating our chatbot for which I had implemented a system prompt based functionality and parsing the responses by hand to use function calling.
It seems to be worse at getting the syntax of inputs right than before and goes into failure modes where it passes the wrong parameters to the function all the time. After that, it does not recover.
Quite disappointed. Going to check if I’m making some mistake in the prompting but I don’t think so.
Now my system is actually less flexible than before because for example
it can only do one call per response
there is no documented way to respond to starting of a long-running function
goes into irrecoverable feedback loops where it remembers it’s own errors very easily
I can’t imagine how it could be used in production at this stage.
I also tried with the new GPT-4 model. Same results. And GPT-4 was quite reliable with my previous “manual” method.
For me its much better after the update. (3.5)
I could basically remove all workarounds for making sure it follows the system message. (like always putting system just before your question when the context is long)
I think in some cases when there are a lot of messages in the context it eventually starts to ignore system. But its just much much better than it was.
Basically exactly the update I wanted. (better system following, more tokens, cheap)
Thank you for your sharing @Neoony . really appreciate it
If you don’t mind, may I ask you some questions? Actually, i’m wondering in your point here:
Does that mean if for some reason your context hit some threshold, you will put the “system” prompt right above the last question from the role User right? If that happened, does the prompt will have a 2 duplicate system prompt placed in the first element and before the last question? Or simply moving the system prompt for first to the last question?