Fine-tuning GPT-3.5 Turbo gives horrid results

Good morning,

Recently I’ve attempted to fine-tune ChatGPT/GPT 3.5 Turbo using data from the FAQ that I’ve extracted from a customer.

I’ve formatted the data correctly using a python script I wrote and have created a fine-tuning on that data. However, when I ask a question from the training dataset to the fine-tuned model, I still get horrible answers that most of the time are incorrect, while the question/answer pairs from the fine-tuning are pretty concrete.

Here is an example of a record from my data:

{"messages": [{"role": "system", "content": "You are a digital assistant on the website of <COMPANY NAME REMOVED>. Answer the questions of users in a respectful and professional manner."}, {"role": "user", "content": "Can I order at <COMPANY_ NAME REMOVED> from another country and have my package delivered there?"}, {"role": "assistant", "content": "This is not yet possible. However, for customers that reside in Germany or Belgium, it is possible to have your order be prepared for pick-up at a location near the border."}]}

However, when I prompt exactly the question, I do receive a wrong answer. The FAQ contains ~285 records (which are all question/answer pairs).

I’m wondering if fine-tuning is the way to go here, and if it is, how I should do it properly? In an ideal situation, only questions that are listed in the fine-tuning would be answered by the model.

Also another problem is that this concerns a big company that is very well-known in the Netherlands. This means that the GPT-3.5 base model probably already contains old information on the company, which is also a problem.
I’ve tried to circumvent this by changing the company name in the fine-tuning data to something that doesn’t exist yet. I thought I could do this and then rename the company back in the answer supplied by my fine-tned model.

Thanks in advance for any help :slight_smile:

P.S.: I’ve translated the data to English as it is actually in Dutch. So any grammar errors can be ignored as the original data is correct.