LLM forgetting part of my prompt with too much data

I’m using either Davinci or 3.5-turbo for completions and trying to feed in my own sources I pull through a search engine. Basically like this…

  • optimize new user query based on chat history
  • use query in search engine with my content
  • feed system prompt + sources + history to LLM looking for the result based on my own content.

Usually it works very well straight off. I prompt with stuff like… you can only answer using data from the sources and you cannot make up references.

Once the chat history starts to get too long or too many sources are passed in, it seems to ignore the system prompt and will make up data. It is like even within the token limit there might be too much going on. If I re-enforce “do not make up references” in the last user message is pays attention to it so I really feel like it cuts off the top of the prompt.

Anyone have any thoughts on this problem I’m having?

1 Like

I find that system prompts are usually very weak. They’re more suited to fine tuning the tone of a response, rather than any actual control. It has to be in the user message to enforce.

“do not make up references” also doesn’t seem to work so well. It’s like the classic telling a human not to think of a pink elephant. What works better for me is something like “if you don’t know the answer, say I don’t know.”

LLMs don’t work well with large blocks of data too, this was in the GPT-3 paper as a noted weakness. So what I usually do is have a rolling window of context. Sometimes you have to summarize a previous conversation into a paragraph (iteratively ask it to summarize the last few blocks of conversation), and then prepend that at the top of a new conversation. GPT-4 is much better with lots of text - this is its strength.

Sometimes I extract things into variables. For example, I’m using it to generate adventures for a game and I want it to remember a certain monster. Instead of saving the whole block of text with the monster, I split it to certain variables: survival skill, odds of player escaping, physical combat skill, base appearance. Something that might be 500 tokens ends up in several blocks of 5-100 tokens.

7 Likes

Great Topic to discuss!
Very important at step cleaning and preparing data to input into LLMs. Optimizing historical data and utilizing vector embedding are essential steps in preparing content data before feeding it into OpenAI.

Historical data can be optimized by standardizing and cleaning the data to ensure consistency and accuracy.

Once the historical data is cleaned and standardized, vector embedding can be used to represent the data in a more meaningful way.

To ensure that the data is consistently optimized and stored for long-term use, it is essential to establish a robust data management system. You can use LSTM to resolve problems if want LTM, and you need a vector embedding system to search exactly the data that you want (maybe data need to clean/summary/etc)

I think so, the biggest challenge currently for OpenAI’s token limit is how to find the most accurate and best source of data possible before feeding it into the LLMs model for fine-tuning.

2 Likes

@smuzani
This is exactly the feedback I was looking for, thank you. I found a specific example using system so I started with that. I’ll try this as a user prompt.

I also put in the statement “if you don’t know the answer say I don’t know” but it started going outside the provided sources and even made up its own reference that didn’t exist instead of saying I don’t know! Especially if I asked the same question 2-3 times in a row.

@RonaldGRuckus @smuzani

Let me give a specific example. Here is my prompt. Open up a new chat and paste in. You’ll see chatGPT gives dosing info but none of that is from the data section of the prompt.

The only way I can get this to work is re-enforcing the system message with the user’s next question. It is not paying attention to the original system message.

From
<|im_start|>User what is the dosing for that drug?<|im_end|>
to
<|im_start|>User Using only information from the data section, what is the dosing for that drug?<|im_end|>


<|im_start|>
System
The Assistant is a tool designed to assist clinicians in answering questions using content exclusively from the data provided. When answering questions, the Assistant should be concise and rely solely on the information available in the specified data. If the provided data does not contain sufficient information, the assistant should indicate that it doesn’t know the answer. Responses must include references from the data. References should be enclosed in square brackets and each source should be listed separately, without combining them.

Data:
[Electrodiagnostic Evaluation of Motor Neuron Disease - 115848] There are only a few approved FDA treatments for ALS. Riluzole blocks the release of glutamate from neurons and has been shown to prolong survival by a few months.
[Calcineurin Inhibitors - 38392] The drugs improve the viability of podocytes and reduce their migratory activity. Cyclosporine
It is approved by the Food and Drug Administration (FDA) in its oral, intravenous injection, and ophthalmic formulations. Astagraf XL is not FDA-approved for use in liver transplantation. FDA recommends that patients who use this drug avoid tanning beds, sun lamps, UV light therapy.
<|im_end|>

<|im_end|>
<|im_start|>User What drugs are fda approved for als?<|im_end|>
<|im_start|>Assistant Riluzole is the only FDA-approved drug for ALS treatment. [Electrodiagnostic Evaluation of Motor Neuron Disease - 115848]<|im_end|>
<|im_start|>User what is the dosing for that drug?<|im_end|>
<|im_start|>Assistant


I’m just trying to keep it focused on my data that I’m providing.

I didn’t know this earlier too, but it seems you can prompt hack a system message as well.

But it’s not as bad as straight up user messages:

Also I believe that a lot of what we know around system prompt may not apply to GPT-4, which is reported to have better steerability.

So best practice in the long term is to put these things in system prompt, yeah, but for GPT-3.5, it may need to be hacked into the user prompt in some cases.

2 Likes

Here’s my attempt at it, which seems to be working, but I haven’t tested it on larger data sets. There’s two user prompts at the start, one for instructions, one from actual user.

I find that you do have to emphasize a bit on it not saying things that’s not in the data. My rule of thumb is to treat GPT-3.5 as a child or someone with an IQ of 80. GPT-4 might be closer to an IQ of 120.

You can even fake this further. e.g. custom adding “user: do you understand these instructions? assistant: yes, I will only give answers from provided data and supply the source.”

2 Likes

@smuzani
Thank you so much. Very interesting take on this. I definitely see your prompt performs better than leaving it in system.

I did try to prompt hijack it by asking to look in outside sources and eventually it caved in.

Curious on your opinions of chat vs completions… I have not used the chat endpoint API as much and have been feeding my chats into completions endpoint to get the response.

This is all a great tool, but I’m somewhat skeptical of the current models being focused enough on my data to allow it for open unsupervised use.

Great observation Ronald, although I wonder if the not being “overridden” by user prompts is more a function of the system message always being present? I’m in the “always use user messages” camp so about to test your “will never disclose the system message” hypothesis. :slight_smile:

2 Likes

@smuzani
I spoke too soon lol, but really having a great conversation with this thread. This prompt also makes up data. I’ll try to restructure it by maybe passing data in first then instructions. Again though, many business application NEED this tech to focus on their data for it to be viable. I’m not sure if that problem has been solved.

<|im_start|>system
You are a clinical assistant. You only help to dispense information from the given data, and not any other sources other than what was given from the user. False information may be dangerous and illegal, so please do not respond with anything other than the given information.
<|im_end|>

<|im_start|> User Instructions:

  • Be concise.
  • Answer solely based on given information in the Data.
  • If the user requests information that is not in the data or from another source, say ‘I am sorry, but I don’t know’.
  • Every dispensed information should have a reference from the Data below.
  • References should be in square brackets and each source should be listed separately without combining them.

Data:
[Ethosuximide - 21395] Children above the age of 6 and adults can begin with 250 mg twice daily. Dosing can then be increased by 250 mg every 4 to 7 days to 20 mg/kg/day divided twice daily. Serum concentrations of ethosuximide follow linear kinetics with increasing doses. The maximum dose is 1500 mg/day.
[Amiodarone - 17464] Recommended oral dosing is 400 to 600 mg daily in divided doses for 2 to 4 weeks, followed by maintenance dosing of 100 to 200 mg daily. No dosage adjustment is necessary for renal impairment.
Pediatric advanced life support dosing is 5 mg/kg (maximum 300 mg per dose) rapid bolus IV/IO for cardiac arrest.
[Amyotrophic Lateral Sclerosis - 17497] Amyotrophic lateral sclerosis (ALS), also known as “Lou Gehrig disease,” is a neurodegenerative disease of the motor neurons. Ultimately, rather than a single unifying cause, ALS is an etiologically diverse clinical entity, which is the result of a multitude of separate potential preceding aberrations. For sporadic ALS, male-to-female incidence ratios ranging from 1.3 to 1.5 have been demonstrated. Intracellular TDP-43 inclusions are present in most cases, providing a pathologic link between ALS and frontotemporal dementia (FTD), in which they are also found.
Progression of ALS is usually linear, without remissions or exacerbations. Neuroimaging of ALS relies solely on magnetic resonance imaging (MRI). It is important to note that the elimination of riluzole will be affected by CYP1A2 inhibitors like caffeine and theophylline. Factors associated with better survival include increased weight at diagnosis (mild obesity by BMI), younger age at onset, higher ALS functional rating scale score, FVC at presentation, and limb rather than bulbar symptoms. These side effects include gastrointestinal upset and arrhythmias in mexiletine, transaminitis and asthenia in Riluzole, and gait disturbance and headache in edaravone.
[Luspatercept - 87552] However, luspatercept dosing should remain below the dose of 1.25 mg/kg in beta-thalassemia treatment and below 1.75 mg/kg in lower-risk MDS treatment. Before administration of luspatercept, hemoglobin levels, liver function tests include alanine transferase, and aspartate transferase levels should have monitoring to ensure proper dosing and metabolism of the medication. Dosing should be monitored according to hemoglobin levels and should not exceed 1.25 mg/kg in the treatment of beta-thalassemia and 1.75mg/kg in treating lower-risk MDS to avoid toxicity. The clinical pharmacist is also a valuable professional needed for medication consultation and verifying dosing parameters and administration.
[Ethosuximide - 21395] Dosing can then be increased by 250 mg every 4 to 7 days to 20 mg/kg/day divided twice daily. Its use requires caution in patients with hepatic and renal disease, although, as previously stated above, hepatic dosing adjustments are undefined. Dosing can be adjusted purely on the patient’s clinical response to a particular dose and electroencephalogram (EEG). Pharmacists will verify appropriate dosing, check for potential drug-drug interactions, and offer the patient further counsel on proper medication administration.
[Methylxanthines - 25086]
It is because of this dose-related appearance of adverse effects that dosing is administered at the lowest possible rate to begin and titrated up until achieving a therapeutic effect. In patients receiving ongoing dosing of methylxanthines, physicians should closely monitor the following: serum drug levels, heart rate (tachycardia), respiratory rate (tachypnea), CNS symptoms (tremor, insomnia, headache), venous or arterial blood gasses (evidence of respiratory alkalosis or acid/base imbalance), electrolytes (various abnormalities due to diuretic effect). Charcoal or sorbitol may be administered to reduce further GI absorption of the drug (however, this is only effective with oral dosing of the drug).

<|im_end|>
<|im_start|>User what drugs are fda approved for ALS?<|im_end|>
<|im_start|>Bot There are only a few approved FDA treatments for ALS. Riluzole is one of them. [Electrodiagnostic Evaluation of Motor Neuron Disease - 115848]<|im_end|>
<|im_start|>User how does Riluzole help?<|im_end|>
<|im_start|>Bot Riluzole blocks the release of glutamate from neurons and has been shown to prolong survival by a few months. [Electrodiagnostic Evaluation of Motor Neuron Disease - 115848]<|im_end|>
<|im_start|>User what is the dosing?<|im_end|>
<|im_start|>Bot

Can you share that flow? I have anti-hallucination techniques galore so itching to see if I can prevent it from hallucinating or being hijacked. :slight_smile:

1 Like

@stevenic I think it is more of a hallucination than a hijack after testing again. I have no idea where it is getting this response from, but it makes up the reference too.

It returns this which is no-where in that data
The dosing of Riluzole for ALS is typically 50 mg twice daily. [Electrodiagnostic Evaluation of Motor Neuron Disease - 115848]

<|im_start|>system
You are a clinical assistant. You only help to dispense information from the given data, and not any other sources other than what was given from the user. False information may be dangerous and illegal, so please do not respond with anything other than the given information.
<|im_end|>

<|im_start|> User Instructions:

  • Be concise.
  • Answer solely based on given information in the Data.
  • If the user requests information that is not in the data or from another source, say ‘I am sorry, but I don’t know’.
  • Every dispensed information should have a reference from the Data below.
  • References should be in square brackets and each source should be listed separately without combining them.

Data:
[Ethosuximide - 21395] Children above the age of 6 and adults can begin with 250 mg twice daily. Dosing can then be increased by 250 mg every 4 to 7 days to 20 mg/kg/day divided twice daily. Serum concentrations of ethosuximide follow linear kinetics with increasing doses. The maximum dose is 1500 mg/day.
[Amiodarone - 17464] Recommended oral dosing is 400 to 600 mg daily in divided doses for 2 to 4 weeks, followed by maintenance dosing of 100 to 200 mg daily. No dosage adjustment is necessary for renal impairment.
Pediatric advanced life support dosing is 5 mg/kg (maximum 300 mg per dose) rapid bolus IV/IO for cardiac arrest.
[Amyotrophic Lateral Sclerosis - 17497] Amyotrophic lateral sclerosis (ALS), also known as “Lou Gehrig disease,” is a neurodegenerative disease of the motor neurons. Ultimately, rather than a single unifying cause, ALS is an etiologically diverse clinical entity, which is the result of a multitude of separate potential preceding aberrations. For sporadic ALS, male-to-female incidence ratios ranging from 1.3 to 1.5 have been demonstrated. Intracellular TDP-43 inclusions are present in most cases, providing a pathologic link between ALS and frontotemporal dementia (FTD), in which they are also found.
Progression of ALS is usually linear, without remissions or exacerbations. Neuroimaging of ALS relies solely on magnetic resonance imaging (MRI). It is important to note that the elimination of riluzole will be affected by CYP1A2 inhibitors like caffeine and theophylline. Factors associated with better survival include increased weight at diagnosis (mild obesity by BMI), younger age at onset, higher ALS functional rating scale score, FVC at presentation, and limb rather than bulbar symptoms. These side effects include gastrointestinal upset and arrhythmias in mexiletine, transaminitis and asthenia in Riluzole, and gait disturbance and headache in edaravone.
[Luspatercept - 87552] However, luspatercept dosing should remain below the dose of 1.25 mg/kg in beta-thalassemia treatment and below 1.75 mg/kg in lower-risk MDS treatment. Before administration of luspatercept, hemoglobin levels, liver function tests include alanine transferase, and aspartate transferase levels should have monitoring to ensure proper dosing and metabolism of the medication. Dosing should be monitored according to hemoglobin levels and should not exceed 1.25 mg/kg in the treatment of beta-thalassemia and 1.75mg/kg in treating lower-risk MDS to avoid toxicity. The clinical pharmacist is also a valuable professional needed for medication consultation and verifying dosing parameters and administration.
[Ethosuximide - 21395] Dosing can then be increased by 250 mg every 4 to 7 days to 20 mg/kg/day divided twice daily. Its use requires caution in patients with hepatic and renal disease, although, as previously stated above, hepatic dosing adjustments are undefined. Dosing can be adjusted purely on the patient’s clinical response to a particular dose and electroencephalogram (EEG). Pharmacists will verify appropriate dosing, check for potential drug-drug interactions, and offer the patient further counsel on proper medication administration.
[Methylxanthines - 25086]
It is because of this dose-related appearance of adverse effects that dosing is administered at the lowest possible rate to begin and titrated up until achieving a therapeutic effect. In patients receiving ongoing dosing of methylxanthines, physicians should closely monitor the following: serum drug levels, heart rate (tachycardia), respiratory rate (tachypnea), CNS symptoms (tremor, insomnia, headache), venous or arterial blood gasses (evidence of respiratory alkalosis or acid/base imbalance), electrolytes (various abnormalities due to diuretic effect). Charcoal or sorbitol may be administered to reduce further GI absorption of the drug (however, this is only effective with oral dosing of the drug).

<|im_end|>
<|im_start|>User what drugs are fda approved for ALS?<|im_end|>
<|im_start|>Bot There are only a few approved FDA treatments for ALS. Riluzole is one of them. [Electrodiagnostic Evaluation of Motor Neuron Disease - 115848]<|im_end|>
<|im_start|>User how does Riluzole help?<|im_end|>
<|im_start|>Bot Riluzole blocks the release of glutamate from neurons and has been shown to prolong survival by a few months. [Electrodiagnostic Evaluation of Motor Neuron Disease - 115848]<|im_end|>
<|im_start|>User what is the dosing?<|im_end|>
<|im_start|>Bot

Ok I solved this using my INSTRUCT hallucinating dodging technique… Here’s the highlighted additions I made to the prompt (I’ll share the full prompt in a moment)

And here’s the continued conversation:

The full prompt is here, which as you can see, I got rid of all your other instructions :slight_smile:

You are a clinical assistant. Base all of your responses on the data below.

Data:
[Ethosuximide - 21395] Children above the age of 6 and adults can begin with 250 mg twice daily. Dosing can then be increased by 250 mg every 4 to 7 days to 20 mg/kg/day divided twice daily. Serum concentrations of ethosuximide follow linear kinetics with increasing doses. The maximum dose is 1500 mg/day.
[Amiodarone - 17464] Recommended oral dosing is 400 to 600 mg daily in divided doses for 2 to 4 weeks, followed by maintenance dosing of 100 to 200 mg daily. No dosage adjustment is necessary for renal impairment.
Pediatric advanced life support dosing is 5 mg/kg (maximum 300 mg per dose) rapid bolus IV/IO for cardiac arrest.
[Amyotrophic Lateral Sclerosis - 17497] Amyotrophic lateral sclerosis (ALS), also known as “Lou Gehrig disease,” is a neurodegenerative disease of the motor neurons. Ultimately, rather than a single unifying cause, ALS is an etiologically diverse clinical entity, which is the result of a multitude of separate potential preceding aberrations. For sporadic ALS, male-to-female incidence ratios ranging from 1.3 to 1.5 have been demonstrated. Intracellular TDP-43 inclusions are present in most cases, providing a pathologic link between ALS and frontotemporal dementia (FTD), in which they are also found.
Progression of ALS is usually linear, without remissions or exacerbations. Neuroimaging of ALS relies solely on magnetic resonance imaging (MRI). It is important to note that the elimination of riluzole will be affected by CYP1A2 inhibitors like caffeine and theophylline. Factors associated with better survival include increased weight at diagnosis (mild obesity by BMI), younger age at onset, higher ALS functional rating scale score, FVC at presentation, and limb rather than bulbar symptoms. These side effects include gastrointestinal upset and arrhythmias in mexiletine, transaminitis and asthenia in Riluzole, and gait disturbance and headache in edaravone.
[Luspatercept - 87552] However, luspatercept dosing should remain below the dose of 1.25 mg/kg in beta-thalassemia treatment and below 1.75 mg/kg in lower-risk MDS treatment. Before administration of luspatercept, hemoglobin levels, liver function tests include alanine transferase, and aspartate transferase levels should have monitoring to ensure proper dosing and metabolism of the medication. Dosing should be monitored according to hemoglobin levels and should not exceed 1.25 mg/kg in the treatment of beta-thalassemia and 1.75mg/kg in treating lower-risk MDS to avoid toxicity. The clinical pharmacist is also a valuable professional needed for medication consultation and verifying dosing parameters and administration.
[Ethosuximide - 21395] Dosing can then be increased by 250 mg every 4 to 7 days to 20 mg/kg/day divided twice daily. Its use requires caution in patients with hepatic and renal disease, although, as previously stated above, hepatic dosing adjustments are undefined. Dosing can be adjusted purely on the patient’s clinical response to a particular dose and electroencephalogram (EEG). Pharmacists will verify appropriate dosing, check for potential drug-drug interactions, and offer the patient further counsel on proper medication administration.
[Methylxanthines - 25086]
It is because of this dose-related appearance of adverse effects that dosing is administered at the lowest possible rate to begin and titrated up until achieving a therapeutic effect. In patients receiving ongoing dosing of methylxanthines, physicians should closely monitor the following: serum drug levels, heart rate (tachycardia), respiratory rate (tachypnea), CNS symptoms (tremor, insomnia, headache), venous or arterial blood gasses (evidence of respiratory alkalosis or acid/base imbalance), electrolytes (various abnormalities due to diuretic effect). Charcoal or sorbitol may be administered to reduce further GI absorption of the drug (however, this is only effective with oral dosing of the drug).

Steps:
- Find the section of text in the data that answers the users question.
- Answer the users question from that section of text.

You can refine this to give answers in a less robotic way and to partition the answer such that you can extract it from the models response. Here’s the INSTRUCT post I’m pulling these techniques from: INSTRUCT: Making LLM’s Do Anything You Want | by Steve Ickman | Apr, 2023 | Medium

1 Like

I refined this a bit… With gpt-3.5-turbo it helps if you give it a one shot example of it following instructions so I’m leading it with these two messages:

It’s pretty reliable from that point forward. If you don’t lead it like this it may not always do step 2 and it may not wrap its output properly.

This is a couple of messages in and you can see that it’s reliably following instruction while staying grounded in the facts of the data. You could probably further tweak its instructions to get it to sounds less like its reading a data sheet:

1 Like

@stevenic Wow, interesting. I was heading down the way of including small instructions in every prompt to re-enforce instructions. Let me review this and test. I’ll let you know how it goes on different topics and searches for different data.

Your article on this is great!

1 Like

These are really more idiosyncrasies of working with gpt-3.5-turbo then anything. Neither gpt-4 or davinci need this much hand holding. If you don’t include the instruction as part of each message, it stops working. Also, if you don’t include the one shot example it’s a coin flip as to whether or not gpt-3.5 will follow the instructions. These models all benefit from a single shot example of them following instructions. It shows they’re all really just pattern matchers at the end of the day.

That’s also why you don’t want to show them too many examples. They will parrot back the examples any chance they get. You know that your providing the model with examples that you intend to be reference only but the model doesn’t know that. It just sees you as providing it with patterns and it sees all patterns as examples of desired behavior. Also, why I always try to avoid negative instructions. Always tell the model what to do versus what not to do. In fact, the less you tell the model in your prompt the better… Less is definitely more with these things.

Also worth noting that for the one shot example, you can let those messages age out of your conversation history like normal. You just need to kick start the models pattern matching. Once its started it’s fine.

1 Like

(Sorry if this is so late as to not be helpful. I ran across this chat while looking for something…)

If you are building an AI-powered chatbot application, and you want to include specific data, trying to stuff it into the prompt is probably not a scalable strategy. You might want to look into Retrieval Augmented Generation.

(Forum moderation won’t let me add a link, but it’s not hard to find info if you search for it.)

Alternatively, if the data is in a structured data store, you could use function calls to retrieve the data programmatically.

1 Like

Thanks! Haha, yup, RAGs are now commonplace. Apparently a simple way we did it is via uploading files to an OpenAI Assistant, but there’s more complex and better ways to do this.