Chat System message to not use public data only provided data

donniekerr01 · July 24, 2023, 4:34am

How do I write a system message that directs chatGPT to only use data provided by the user and not public chatGPT. This is for an application intended for use internally within a company using internal data.
This data is provided by the user and appended to the prompt.
Further, I want it to say “data is not available” if no data is provided following the system prompt. Right now, if no data is provided, it defaults to using public chatgpt training data.
I need to be able to lock it down.

donniekerr01 · July 24, 2023, 4:38am

Here is the current message I have tried

“Welcome! I am an assistant programmed to discuss the contents of Acme, Inc Documents, specifically from a Vector Database provided to me. To maintain data integrity, I will solely rely on the information available in this designated database. If, for any reason, the data is unavailable, I will inform you by saying ‘This service is currently unavailable.’ Please proceed with your queries about the provided documents, and let’s begin our conversation!”

anon22939549 · July 24, 2023, 4:44am

That’s not really how these things work.

How much of “public” data do you not want it to use? Like, should it not know how to write sentences or what words mean? How do you expect it to decide what data to ignore?

theevildays · July 24, 2023, 8:19am

LLMs do get hallucinations and sometimes come up with irrelevant info.

donniekerr01 · July 24, 2023, 11:57am

Guess using ChatGPT to help write system message for you can be a hallucination too. It can tell you it is possible to limit/filter its responses to be based on the context data provided by the user or from a vector database. Not a huge deal.

mike.renwick · July 24, 2023, 12:19pm

One useful thing to explore is this paper:
[2303.11315] Context-faithful Prompting for Large Language Models.

Essentially when you ask a question, that needs to be faithful to the context, it’s quite slippery to get the LLM to not draw on existing knowledge.

This paper asserts that by qualifying it as “Bob’s opinion” (no joke!) - and asking the LLM to answer the question “in Bob’s opinion”, you get a more context faithful answer.

Here’s a summary of the paper I generated with my auto-summariser:

Summary:*

The paper titled “Context-faithful Prompting for Large Language Models” focuses on improving the faithfulness of Large Language Models (LLMs) in context-specific Natural Language Processing (NLP) tasks. The authors identify two aspects where LLMs could improve: knowledge conflict and prediction with abstention. They propose two methods to enhance LLMs’ faithfulness: opinion-based prompts and counterfactual demonstrations.

Opinion-based prompts reframe the context as a narrator’s statement and inquire about the narrator’s opinions, forcing the model to pay more attention to the context. Counterfactual demonstrations use instances containing false facts to improve faithfulness in knowledge conflict situations. The authors conducted experiments on three datasets of two standard NLP tasks, machine reading comprehension and relation extraction, and found significant improvements in faithfulness to contexts.

Prompt Suggestions:

Opinion-based Prompt:
- Context: “Bob said, ‘{context}’”
- Question: “What is the summary of the document according to Bob’s statement?”
Attributed Prompt:
- Context: “{context}”
- Question: “Can you summarize the document based on the given text?”
Instruction-based Prompt:
- Instruction: “Read the given information carefully and provide a summary.”
- Context: “{context}”
- Question: “What is the summary of the document?”
Opinion + Instruction-based Prompt:
- Instruction: “Based on Bob’s statement, provide a summary.”
- Context: “Bob said, ‘{context}’”
- Question: “What is the summary of the document?”

Note: Replace ‘{context}’ with the actual text to be summarized.

huronsen · July 24, 2023, 10:48pm

have you experimented with different structures of your dataset? Hallucinations mostly occur, if the relevant information hasn’t scored high enough.
I had the same problem and could manage it by restructuring my data.

While working on it i also had the idea to embed fake data, which scores lower and is labeled as “No Information”, so that this would get responded whenever no real data is found. Never tried though.

Topic		Replies	Views
How to Reduce Hallucinations in ChatGPT Responses to Data Queries Prompting gpt-4 , adv-data-analytics	5	10749	December 2, 2024
Can a good prompt prevent 'hallucination'? Prompting chatgpt , api	6	4367	November 4, 2023
Providing Context Injection-Safe API	10	180	May 31, 2025
GPT-4 keeps lying instead of saying "I don't know"! Prompting gpt-4 , hallucinations	6	7604	December 19, 2023
Instruction to exclude certain information when generating answer Prompting gpt-35-turbo , chatgpt , api	8	3532	November 4, 2023

Chat System message to not use public data only provided data

Related topics