Hallucination after fine tuning

Hello, I’m having a bit of an issue with a project I’m currently working on. I have an existing website that is essentially a database of successful cover letters for a specific type of job application. I am building a chatgpt assistant that would essentially write better cover letters based on the data I have. I first started with prompt engineering a few sets of prompts that would give a few examples and then some information about the company and then write a cover letter based of a new CV. However I couldn’t get it up to a good enough standard. I decided to fine-tune. I fed all the data I have which is around 1700 cover letters for about 100 different companies. My issue with the new checkpoint is that it hallucinates a bit too much. I would feed it a new CV and give my prompt, but in the cover letter it would include a lot of random information from the training data. For example, it would throw in a random work experience from a cover letter from the training data (which does not exist in the new CV). Any suggestions on where I’m going wrong?

My 2 cents:
I would say that if you want a model to get to the essence of what a good cover letter is you would have to fine tune with a lot more cover letters.

Another way could be to prompt engineer through the cover letters to get the essence of what makes a good cover letter with examples, and fine tune your model with those outputs, so not cover letters but information of what makes cover letters good.

I would challenge the idea that you have saturated what you can do with prompt engineering.

I’ve noticed similar issues while fine-tuning. I believe my model was overfitting the training data. I was able to reduce this by increasing batch size and reducing the number of epochs from 3 to 1.

1 Like

Welcome to the Forum!

Do you mind sharing details of what your training data set looked like, i.e. what you included in your prompt (system & user message) and the resulting assistant message? If you can, seeing the redacted version of your prompt would also be helpful to be able to pinpoint the potential problem.

Hi, can you share a couple of lines from your training file and the fine-tuning parameters?

If the parameters are good, my guess would be there is no clear task definition for the model/logical workflow implemented for the user case.

I think this is the most reasonable explanation