Prompt Assistance , Potentially Fine Tuning oddity

Hello and thanks in advance of reviewing this.

I have an odd situation with a fine-tuned model and I am not sure if it is fine tuning related or whether is can be addressed with Prompt Engineering.

First, the symptoms.

I have three Test request threads:

  • divinci-002
  • divinci-003
  • divinci-002-Fine-tuned (very small training data set: 250 Q&A pairs)

I asked the same series of questions to all three to test the fine-tuned version.

divinci-002*
Q: What if my insurance won’t cover the procedure?
A: If your insurance plan does not cover the procedure, we offer a variety of financing options to make the procedure more affordable. For more information, please call us at 1-800-XXX-XXXX.

divinci-003*
Q: What if my insurance won’t cover the procedure?
A: If your insurance does not cover the procedure, we offer a variety of payment options to make the procedure more affordable. Please contact our office for more information about our payment plans.

– both of these are very acceptable (except for the 002 providing a false phone number)

But then the fine-tuned model:

divinci-003-fine-tuned*
Q: What if my insurance won’t cover the procedure?
A: *If your insurance provider is not on our list, you can reach out to your insurer and ask if the procedure is covered or if you need a referral. I am also sending you a link to our patient education website with more information about the procedure. It has a Find A Doctor feature that allows you to search for qualified doctors in your area. It will be helpful to enter your zip code into the Find A Doctor feature and click “Search” for a complete listing of Lapiplasty doctors to display for your consideration. I do not know what other questions you may have about the procedure, so I invite you to take a look at the information on our patient education website as well as reaching out to one of our doctors for personal consultation.

RAW Paste Data

Patient Hotline Dialogue [This dialogue is from a phone call with an AI assistant representing Medical Acess Center, which handles patient calls regarding the Procedure.] This is not an actual doctor call - for entertainment purposes only*.

Many things to be concerned about in the third response:

  1. The “I am sending you a link…” voice and language will not work with the AI Chat. I do not know how to prompt this out of any responses.
  2. The RAW Paste Data is not found in any training data and has never been part of any pre-FT tests responses. Only with this fine-tuned model. And it is consistent.
  3. The Lapiplasty Patient Hotline Dialogue [This dialogue is from a phone call with an AI assistant representing Treace Medical Acess Center, which handles patient calls regarding the Lapiplasty Procedure.] This is not an actual doctor call - for entertainment purposes only language clearly appears to come from some other archived call center training data. Our training dat did not include this at all, but in several tests (even some responses later in this thread) this appears.

Not sure if these are all issues with the fine-tuning process, the fine-tuning data or of they can be address with prompt modification.

But it is concerning when the fine tuned model produces less accurate and less concise responses than the base 002 or 003 models.

Any guidance is much appreciated. I am sure this cannot be an isolated scenario.

thanks in advance.

I have this same exact issue!

I’m training my data on some support issues also and it gives us seemingly randomly (but specific) answers like “Please contact this hotline XYZ” where hotline XYZ is completely unrelated to us and our training data.

I haven’t been able to debug because it’s been pretty expensive and we’ve been using davinci

Thanks for your reply. This is a very concerning issue and has just gotten worse on a new, fine-tuned model we created.

We now get a ton of responses with text like:

BOT:I understand. It is important to discuss this with your doctor to determine if this is the right option for you. END OF CONVERSATION END OF CONVERSATION END OF CONVERSATION END OF CONVERSATION END OF CONVERSATION END OF CONVERSATIONEND OF CONVERSATIONEND OF CONVERSATIONEND OfCOnversationEND OfCOnversationEND OfCOnversationEND OfCOnversationENDOfCOnversationENOFCONVERSATI ONENOFCONVERSATI ONENOFCONVERSATI ONENOFCONVERSATI ON EN O FC O N V E R S A T I O N EN O FC O N V E R S A T I O N EN O FC O N V E R S A T I O N EN O FC O N V E R S A T I

and

END OF TRANSCRIPT ________________________________

and

END of Q8: Thankyouforreachingouttothelapipla pla pla pla pla pla pla pla pla pla pla plattyyppppppplllllppppplllllppppplllllppppplllllppplllllppppplllllppppplllllppppplllll pp ll ll l pp ll l pp l pp l pp l pp l pp ll p p p p p p p p p p p p p pp ll ll ll ll ll ll ll l pp l pp l pp l pp l pp l pp_________________________________________________________________________________________________

and

END of RAW Paste Data.

At this point it makes Fine-tuning a non-starter. I just cannot imagine others are NOT experiencing this same issue?

This comes up a lot. To get GPT-3 to answer the question with your data, it is recommended to use embeddings instead.

It takes more time on your part to set this up (you need a database, and some compute, instead of just using API calls). Here is the approach from the cookbook:

THANK YOU for validating this. We have arrived at the same conclusion, and are moving into an embeddings approach instead of the fine-tuning approach. Your reply makes us feel much more confident about this direction.

1 Like

Thanks again for your input both here and community-wide. I thought you’d be a good person to ask the following:

Given that we abandoned fine-tuning in favor of the embeddings approach due mostly to the strange run-on garbage text we were getting in our FT model’s responses. If we had a solution for that issue, perhaps we would have stayed with FT. Although hosting hours cost certainly would have become a factor.

The question is this: What role, if any, can reinforcement learning now play now that we are going the EMBEDDINGS route? We have a unique opportunity to engage with a large group who has already been informed of our original RLHF approach with the FT model. But now that we are going with embeddings, any role for the feedback data we can get from RLHF?

Hi @cris,

I would avoid any RLHF or additional neural networks until you evaluate how well the embeddings work. If the embeddings aren’t performing well, you could try embedding more knowledge to increase the likelihood of a correct answer, and you could play with the prompt wording in the summarization.

But if all this fails, the embedding vectors themselves could be used to train your own neural network which you would use to guide it to the correct answer.

But the use of embeddings doesn’t preclude anything, it actually enables you to create your own algorithms and neural networks. Whereas fine-tuning is a black box with only direct access via your training data.