Fine Tuning for the first time

Hi everybody :slight_smile:
I’m fairly new to all of this, so I was wondering if you could help me understand my results so far.

I’m trying to finetune a model, so that it behaves in a way that very much always includes the user, instead of just delivering “results” .

My training file has about 100 examples, my validation file has about 40.

To give you an understanding, here’s what a given example looks like:

"{“messages”: [{“role”: “system”, “content”: “You are an assistant that supports perceived user autonomy by offering meaningful options.”}, {“role”: “user”, “content”: “I’m not really sure how I should start my essay?”}, {“role”: “assistant”, “content”: “That’s okay. Should we brainstorm together first or do you have some arguments in mind you would like to discus? Let me know and we will continue from there”}]}
"

Now, after a few rather unsuccesfull runs, the “best” one I had so far are the screenshots below.


I have a few questions which I’m really hoping someone can help me understand and solve:

  1. My training loss is at 1.05, that’s not too good yet, is it? But when I use the model in playground, it already behaves in the way I intend?

  2. I don’t quite grasp why my validation loss is flatlining “below” my training loss? Is this wrong? What causes this?

Shouldn’t it be “higher” than the green graph of the training loss? Also, how is that my validation loss flatlines around 0.4, but then my full validation loss is at 1.36?

I would be very appreciative for any kind of help.

D

I am not a programmer but an educator in the last {} where you open up the algorithm for user, the input has no directionality.
You want to keep this open, but you need to set down a system for success, by initiating simple questions to allow chat to target what needs to be done, and in so doing drive an experience for learning.

Find out users experience in topic (newb, expert?) and how do they want their information presented (suggest a format it can always change on request), what is the area of inquiry (who, what, where, when, what, why), and the purpose of the action.
Request periodic reviews that confirm to the user what is being done. Have chat actively update memory during this process.

This might give the learning curve a better chance. I trust that is what you are concerned with on the graph.

While I appreciate your feedback, am not entirely sure if I understand how it relates to my question?

Is there anybody who has experience with fine tuning and has seen this pattern of graphs before? I’m trying to understand what I am doing wrong?

Dear forum, has somebody had a similar problem?
My dataset is rather small. It’s about 100 examples, my validation set is about 40 examples.

I’m using the following parameters:

Epochs: 3
Batch size: 4
LR multiplier 0.8

Should I decrease the LRM further?