How to obtain optimal hyperparameters when tuning GPT

Due to a project I’m currently working on at my company, I’ve been asking a lot of questions and receiving a lot of help from this community recently. thank you.

What I’m working on right now is classifying the topics of a certain report. Because the report has a lot of content, it is difficult for humans to read and classify it on their own, so we would like to enlist the help of GPT.

I am currently trying to fine-tune the GPT-3.5-turbo-1106 model. I’m still generating data, so there is very little data to learn from, but we are planning to conduct tests by trying even small amounts of data.

It is said that in order to fine-tune the model, hyperparameters must be adjusted appropriately. If all parameters are set to auto, the loss of training data is 0, but the loss of verification data is large. So in the end, a trade-off occurs, so you need to adjust the appropriate hyperparameter.
But is there no other way to find this parameter than by continuing to experiment?
I am an biginner web developer, not an AI expert. However, I happened to be in charge of tuning GPT models, and suddenly I am studying and developing GPT. We need your help a lot. I am truly sorry and thank you.

In conclusion, what I want to ask is whether there is a way to find hyperparameters by continuing to experiment with them?
Or is there another way? Any help would be greatly appreciated.
Please keep in mind that I am a beginner web developer, not an AI developer.

1 Like