Finetuning Noob : Guidelines and Best Practices?

So, I have been using the Chat completion endpoints (3.5-Turbo) for a while. And I have built a nice product around it. And now I have reached a point where I need to start fine tuning it to further increase the performance of my system (yes I did try with prompt engineering and single/multi-shot examples, they were not sufficient).

To get hands-on, I tried running a fine-tuning (3.5-Turbo) job with OpenAI API. I will admit I have no idea how this works. I just read and followed the guidelines in the official OpenAI documentation on how much and how to prepare the data. And did the API calls.

But now that I fine-tune it, I do see that, empirically model is significantly much better. But then I looked into my fine-tuning just out of curiosity then I found this training loss graph (image attached).

Can someone explain to me in layman term what this means? Should the training loss monotonically decrease? What are the implications of this my model? Will this performance further increase if somehow “clean-up” and have “better”/more data?

Also, I want to learn more about fine-tuning LLMs, with more empahsis on practical guidelines and best-practices, especially to build out my product. Can someone suggest useful resources?

Thanks folks!

1 Like

Traditionally a loss function shows how well the model is answering questions, i.e., is it getting the answers correct? The closer to 0 you get the more correct the model is becoming.

Typically, the loss function will get lower as more training data is added, 0 loss is however not usually a good thing, at least not traditionally, as it can (and usually is) a sign that the model is “over fitting” this means that the model is learning how to simply repeat the correct answers to any given question, like parrot repeating words. This means that the model has lost it’s ability to generalise and will perform poorly when tested on unknown new data.

However! The new loss function graphs from GPT-3.5 do not seem to be following this as before, so I would not like to make any firm comments on it until I’ve got more information to go on.