Issues with Overwriting Context in Sequential Model Fine-Tuning

Hello everyone,

I’m facing an issue with fine-tuning models sequentially, and I could really use some help or insights. Here’s the process I’ve followed:

  1. First Snapshot: I started with an initial model and fine-tuned it using a specific system message and training data.
  2. Second Snapshot: Using the first snapshot as the base model, I fine-tuned it again with additional data, keeping the same context.
  3. Third Snapshot: I continued this process, taking the second snapshot as the base model and fine-tuning it further to create a third snapshot.
  4. Fourth Snapshot: This time, I took the third snapshot as the base model but changed the system message and context for training with a new set of data.

Now, here’s the problem: When I try to fetch a response based on the contexts and respective system messages of the first three snapshots, the fourth snapshot (which has a different system message and context) seems to be overwriting the previous contexts.

I was expecting the model to retain the distinct context and system messages from each snapshot, but it appears the latest training is influencing responses across all contexts, not just the one it was trained on.

Has anyone else encountered this issue, or does anyone have suggestions on how to maintain separate contexts for each snapshot?

Model type : gpt-3.5-turbo
Epochs : 3
Batch size : 1

Any advice or pointers would be greatly appreciated!

Thank you!

Welcome to the Forum!

I am not 100% sure but I believe that this is due to you relying on entirely new data for the fourth snapshot. You might succeed in getting the fourth snapshot to retain both the existing and new system message / context if you include data for each case in your training set. That said, that would only make sense if the core task is (nearly) the same in both cases.

What specifically are you fine-tuning for?