Location of information in training

Hello, I noticed with all models that when you add training, especially with longer training, GPT-3 focuses more on the more recent training or the training closer to the end. When in doubt, it usually will repeat something from the most recent chunk of training. It can be rather annoying when trying to get GPT-3 to understand more complex subjects because it will almost ignore the first pieces of training it is given and instead focus on the last one or two sections. It does not seem to happen on shorter training usually under 150-200 tokens.

1 Like

I noticed the same a while ago. In my experience, improving the structure and framing around the examples helps a lot. For instance:

Instructions here. What follows are examples:

Example 1: blah blah blah

Example 2: Blah blah blah

Example 3: Blah blah balh

Example 4: <leave blank for completion>

By giving it a repeating pattern like this with clearly delineated examples that are labeled as such, it tends to generalize the point much better. This is an overly simplified example but I hope it helps.

2 Likes

could someone from OpenAI @staff comment on this? is it a bug or a feature? :slight_smile:

1 Like

They don’t need to. It’s neither. It is what it is. You take a pile of linear equations and stir the pot until it starts sounding smarter than an equivalent mass of monkeys and typewriters.

1 Like

well I think it kind of matters if you get different results depending on the position of things.

This wasn’t the point of this. I am wondering in what way does the location of parts of the training effect the output and how we could use this to our advantage if used correctly.

for example if you had:
Question 1: stuff 1
Answer 1: more stuff 1

Question 2: stuff 2
Answer 2: more stuff 2

(more noticeable with more training) gpt-3 will focus more on question 2 vs. question 1. So if you told it something in the beginning of the training it is likely to “forget” that it was told that. I hope this helps you understand.

1 Like