Location of information in training

SecMovPuz · December 9, 2021, 4:58pm

Hello, I noticed with all models that when you add training, especially with longer training, GPT-3 focuses more on the more recent training or the training closer to the end. When in doubt, it usually will repeat something from the most recent chunk of training. It can be rather annoying when trying to get GPT-3 to understand more complex subjects because it will almost ignore the first pieces of training it is given and instead focus on the last one or two sections. It does not seem to happen on shorter training usually under 150-200 tokens.

daveshapautomator · December 10, 2021, 12:43pm

I noticed the same a while ago. In my experience, improving the structure and framing around the examples helps a lot. For instance:

Instructions here. What follows are examples:

Example 1: blah blah blah

Example 2: Blah blah blah

Example 3: Blah blah balh

Example 4: <leave blank for completion>

By giving it a repeating pattern like this with clearly delineated examples that are labeled as such, it tends to generalize the point much better. This is an overly simplified example but I hope it helps.

nunodonato · December 14, 2021, 12:30am

could someone from OpenAI @staff comment on this? is it a bug or a feature?

overbeck.christopher · December 14, 2021, 2:51am

They don’t need to. It’s neither. It is what it is. You take a pile of linear equations and stir the pot until it starts sounding smarter than an equivalent mass of monkeys and typewriters.

nunodonato · December 14, 2021, 10:17am

well I think it kind of matters if you get different results depending on the position of things.

SecMovPuz · December 14, 2021, 4:11pm

This wasn’t the point of this. I am wondering in what way does the location of parts of the training effect the output and how we could use this to our advantage if used correctly.

SecMovPuz · December 14, 2021, 4:32pm

for example if you had:
Question 1: stuff 1
Answer 1: more stuff 1

Question 2: stuff 2
Answer 2: more stuff 2

(more noticeable with more training) gpt-3 will focus more on question 2 vs. question 1. So if you told it something in the beginning of the training it is likely to “forget” that it was told that. I hope this helps you understand.

Topic		Replies	Views
When processing a text: prompt before it or after it? Prompting chatgpt	13	8296	December 14, 2023
Repeated questions improve quality of assistant's responses Prompting assistants-api	4	224	April 29, 2025
Does gpt-3.5-16k instruction following ability decrease as the system prompt gets larger? Prompting gpt-35-turbo	4	1297	December 14, 2023
Same prompt giving required output and then missing details when reprompted again Prompting gpt-4 , api	5	875	April 9, 2024
Finetuning not working? Prompting	8	2580	December 24, 2023

Location of information in training

Related topics