Prevent hallucination with gpt-3.5-turbo

Congrats to the OpenAI team! Gpt-3.5-turbo is cheep and fast!!

But now I’ve a problem is that I can’t prevent hallucination with gpt-3.5-turbo model. In the old way(text-davinci-003) can easily add the text below in the prompt:

Answer the question as truthfully as possible, and if you're unsure of the answer, say "Sorry, I don't know".

But in gpt-3.5-turbo, I had added the text to all role, none of them will reply “Sorry, I don’t know”.
Does anyone successfully prevent hallucination in gpt-3.5-turbo?

2 Likes

Role = System
Content = I need you to Encourage user to discuss about their business, and challenge they facing. If user interested with HiOrder services, ask for contact number. We will contact them in person.

the ChatGPT did trying his best to direct the conversation this way.

1 Like

I see the same thing. ANd yes, much faster, but really eager to divert from topic.

1 Like

Try setting the temperature a lot lower

The default is 1. Try 0.2 or 0.1

1 Like

Okay, after some test after @nicole_n said here

When I use like this:

completion = openai.ChatCompletion.create(
       model="gpt-3.5-turbo",
       messages=[{"role": "system", "content": """Answer the question as truthfully as possible, and if you're unsure of the answer, say "Sorry, I don't know".\n\n{context}}"""},
                          {"role": "user", "content": f"{question}"}]
)

It will reply
Sorry, I don't know what the question is. Could you please provide more context or clarify your question?

The first 4 words are correct, but it seems that gpt-3.5-turbo has lengthen the reply.
Then I’ve notice that my configuration is:

temperature= 0.0,
max_tokens= 300,

I guess because of the max tokens is far above reply text tokens so the model will auto lengthen the reply. So please add more words to “Sorry, I don’t know” or lower the max_tokens.
By doing this, I think the model will reply the exact text you want.

And another thing that I discovered is that if your question and the text (“Sorry, I don’t know”) is in different language. It will not reply the exact text you want too.
About this issue I didn’t figure how to deal with multiple language. Feel free to discuss below.

1 Like

I found it useful to contain the environment (“Name, status, location etc.”) in the initial system message, and then append a new system message to each user message with a reminder of its instructions.

So

[System] - You are a customer service rep based in Fakesville, New Faked
[User] - Hello!
[System] - Answer the above truthfully. If you require additional information, ask for context
[Assistant] - Hi, how can I help?

Would turn into

[System] - You are a customer service rep based in Fakesville, New Faked
[User] - Hello!
[Assistant] - Hi, how can I help?
[User] - I am looking for a gnarly surfboard with a shark bite in it
[System] - Answer the above truthfully. If you require additional information, ask for context

It seems that in the docs they say that the System message will have more significance in the future, and it may be better to actually add it as a user message for now.

4 Likes

hi @RonaldGRuckus, just wanted to make it clear. So if there are 4 user messages, then they will be another 4 system message appended right after user messages right?

3 Likes

Hey.

No, the system message would be shifted to the front so there is only ever one.
Unless it’s carrying context in which so far I’m just using a score system to determine how useful it is.

2 Likes

If you’re trying to get any of these models to reliably reply with specific text you’re fighting a losing battle… As you’ve probably seen it will change languages on you, reword things, add stuff etc.

I’m assuming you’re trying to lock down what it says when its unsure of an answer so that you can detect this in your code and do something else correct?

This is what we classically call a NONE intent… If the model doesn’t know the answer it should return NONE for the intent. So how we get GPT to reliably tell our program that it doesn’t know something? Well on one level that’s a great question because it thinks it knows everything… But that aside, a better approach of conveying to the program that it doesn’t know something is to tell it to return a structure like <intent>NONE</intent>. These models all seem smart enough to know that’s an instruction to code and not a message to a user so you should see more reliable responses…

Hope that helps…

1 Like

I have done extensive testing on this and published the results in various topics in our community.

It makes no difference if the system message is first or last in the messages array. What matters is that the system message be resent (feedback) to the chat completion endpoint.

I have posted these test results a number of times, but will show a simple example, to be clear:

System Message First

System Message Last

Note:

It does not matter if the messages array sent to the API is short with only two messages or very long, with many messages, the results are the same if the message array is formatted properly.

I have demonstrated this many times in this community with a number of examples, some simple, some complex.

It makes no difference if the system message is placed in the front, the end, or any place in between in the messages array, if the messages array is properly done, the system role will work as it should.

HTH

:slight_smile:

Yes the initial system message to set the environment, and then a secondary system message for instruction at the time to accompany the user message was working to keep GPT-3.5-Turbo on-track.

In regards to “only ever one”, that was directly addressing the question of adding a new system message each time, or just bumping it to the end of the array. I should have clarified that better. There would be 2 system messages in total.

I would be very interested in seeing a comparison test done.

It makes no difference if the system message is in the front or the back or the middle. When the message array is sent to the chat completion API endpoint the API will take the last system message and user that message for the dominate system message.

In this simple example, I send a messages array with two system messages to the API,

The final system message (which is in the middle of the message array, sets the “direction”.

This results are the same if there are 6 message in the messages array of 40 messages and irregardless of token count.

I think many people here are simply having trouble managing their messages array since many people are not well versed at arrays of hashes, upgrading arrays of hashes, adding arrays of hashes, pruning arrays of hashes, storing message arrays of hashes in a DB as serialized data, etc.

This, in my view, is why we see considerable misinformation which states the system message should be the last message in the messages array. That is simply misinformation.

If the messages array is managed property the last system message in the messages array, which can be the first message if there is only one system message, works fine. I have reproduced this using many prompts, many sizes of messages arrays, many different chats, most too large to post here

:slight_smile:

Yes, i’ve arrived to this post precisely by this PROBLEM. Let me put this simple example.

My array of role messages is:

[system]
You’re a commercial assistant, be clear and nice answering user.
But if the user question has no relation with the provided CONTEXT INFORMATION,
then only response “I’m sorry, i don’t know the answer” and don’t say anything more.
Don’t talk about nothing more than the mentioned matters in the CONTEXT INFORMATION.

[system-context_user_question]
What is the cost of your service? You can enjoy a free account with certain limitations, and then upgrade to a PREMIUM account for unlimited use.

[system-context_user_question]
What is BeeHelp? BeeHelp is a chat bot assistant to help you answer questions from your customers. You just have to give it a collection of suitable questions and answers about your business and the leading artificial intelligence technology of OpenAI (chatGPT), will interact with your visitors 24/7.

[user]
How much it cost your service?

[assistant]
BeeHelp offers a free account with certain limitations, such as a limited number of questions and answers, and a limit on the number of users who can interact with the chatbot. If you need more questions and answers or a higher interaction capacity, you can upgrade to a PREMIUM account, which has a monthly fee.

The price of the PREMIUM account depends on the number of questions and answers you need and the number of users who interact with the chatbot. You can find more information about pricing on our website.

So… if you read the provided context (about FREE vs PREMIUM)… chatGPT has HALLUCINATED too much:

  • I didn’t tell him never that PREMIUM plan means “monthly fee”. In fact it’s fully false in my case because my payment model is “Pay as you go”
  • it hallucinate too when justify the PREMIUM plan if “you need more questions and answers”. Well it’s true, but it requires more info that i’ve not passed here as context, so here he is hallucinating.

As you can read at top, my “system” instruction is quite vehement:

But if the user question has no relation with the provided CONTEXT INFORMATION,
then only response “I’m sorry, i don’t know the answer” and don’t say anything more.
Don’t talk about nothing more than the mentioned matters in the CONTEXT INFORMATION.

So, it’s quite frustrating this kind of hallucinations… it make very difficult to use it in a production environment :pensive: I’ve invest almost 3 weeks improving the system initial instruct, and then this kind of hallucinations are less frequent and less shocking. But it could be worse in other production contexts.

Anyway, it’s a great tool, of course. I prefer this hallucinations each 10% of responses that not to have the other 90% visitors happy to get attention 24/7 instantly.

I hope that as someone of you told, OpenAI make MORE RELEVANT the system instruction messages to shape the answers and avoid hallucinations.

Indeed, i use temperature = 0. I will try with 0.1… maybe the system has confusing “0” with empty and then use default (high) temperature !

Furthermore, the worse challenge was to avoid to respond to ANY question visitors did about things absolutely out of the scope of the provided CONTEXT in my call to API.

For example, it cannot avoid to respond things like: “make me a plan to sell chairs in the beach?” or “please calculate 3+2” or “how to calculate the area of the circle” etc…

I would like to share with you how i got it to solve this challenge, not using prompting, but using the embeddings. As you probably know, i must to calculate the embedding of the visitor request and then retrieve the closest Q&A in the knowledge base using something like “cosine similarity” function.

So, i simply take this decision:

  • if the most similar questions is under 0.79 then i returns to user something like: I’m not sure I understand. Please try asking in a different way about our services.
  • if the there are a question with a similarity above 0.79 then i take the 2 more similar and pass it a system context to my call to chatGPT completion endpoint

This works like a charm!! In a few cases yet some Q&A are enough similar to a non relevant visitor question. For example, if you have Q&A talking about prices of your service and the visitor answer something like “How much does it cost the ice creams?” But in this case, the instruct in the first “system” message is able to detect it and respond something like “I don’t know the answer.”

But at least, with this simply padlock you avoid to use an useless and expensive call to the chatGPT completion endpoint :grin:

2 Likes

This example is next to useless for the argument. It’s bias, short, repetitive, and completely controlled. There are definitely simpler, cleaner ways of testing your theory. If it’s correct, thank you for the clarification. Myself, and other users have noticed a difference in its effects. Regardless of your implied global incompetency at the time it was working very well.

I would still argue that it’s beneficial to strap instructions to the message. Using the initial message as the environment, and the the strapped message for the instructions to stay on track. There’s just so many beneficial reasons to appending a server message. At the very best your argument is against redundancy.

System messages are also used for context delivery. If it’s true that it completely overrides the previous message - it should be made clear in the documentation.

Hello everyone, I have finally made progress in controlling what chatGPT responds to certain requests or instructions from users in my adaptation of chatGPT. :grin:

I’m sure it will be of interest to you if you have problems with hallucinations and with questions outside the context that you give it:

2 Likes

It was hard for me untill i used the word in prompt. STRICTLY Do not answer questions which are not related to… It worked

1 Like

Thank you very much for your suggestion, Joseph. I’m really becoming frustrated with the hallucinations of the chatGPT API endpoint. If the completion 3.5 endpoint had the same price as the chatGPT endpoint, I would definitely use it because, for my use case, the current endpoint is incredibly frustrating.

Let me share my latest disappointment. In the context, I mentioned that the contact methods for my company are social networks and the contact form on our website. However, when a visitor asked for an email address to contact, chatGPT hallucinated and told the visitor to write to support@ourcompany.com !!!??? :sweat: It completely invented this information, even though I specifically instructed it to inform the user that the answer is unknown if it’s not provided in the context. :rage:

Honestly, the chatGPT endpoint is practically useless when it comes to extracting reliable information from the provided context. It easily fabricates responses.

As a result, I have temporarily halted my beehelp.net project, which relies on chatGPT. It’s just not reliable at all.

In my opinion, OpenAI has focused excessively on building the “perfect general chatbot.” However, in the real world, what we truly need is a perfect LLM (Language Model) that simply follows instructions. OpenAI seems to have built a complex layer architecture around the LLM itself to create this chatGPT endpoint as a “smart chatbot.” Unfortunately, it’s hindering its usability for other commercial and production users.

I hope that the many open-source LLMs that are emerging will soon be capable enough to develop interesting LLM applications in the real world beyond “general chatbots”.

Personally, I would prefer if OpenAI focused on providing a simpler, faster, and less burdened endpoint, free from the excessive rules and instructions they impose. But who knows what the future holds?

I will now test your suggestion (“STRICTLY”) Thanks!

No, man… In my case, it continues to hallucinate with a non-existent email address as a contact method for our company. :sweat: I’ve spent 20 minutes trying different sentences and approaches, but he insists on saying what he considers “good,” even though I asked him not to invent information that is not provided in the context.

I give up… :pensive:

Feeling sorry for you. First time i was so desperate with it untill i had to improve my prompt knowledge. How to instruct it to do this and this. But at certain level it was working well untill you give it long form questions, hallucinations started. I came to know the problems comes much with openAI embeddings. They are not good with that. So i looked in Internet i found this git project…

I haven’t tested it, but i can see number of rates, its good since its developed as a hallucinations free project. Take a look into it. Try and then see where it can take you. Also before trying thst, can you share with me your prompts and code, see were i can help before dumping that project. It may need little twist may be.

1 Like