Is it helpful to add COT data in fine-tuning?

gwsin_0904 · March 15, 2024, 8:53am

Hi all. I’m building a chatbot through fine-tuning on GPT-3.5-turbo. Currently, there are still many failing cases where the prompt has already specifically describe the instruction and there are corresponding cases in the sft data. What should I do to solve those bad-cases? Will COT data which consist of chain of thought and final response text be helpful?

trenton.dambrowitz · March 15, 2024, 9:06am

Welcome to the Dev Community!
As is always the case with these things, I think the correct answer here is “it depends”.

What is your use-case? Is the model “smart” enough to get it right with prompt engineering?

Are you trying to add in external knowledge? If so, you may be better using RAG techniques (e.g. Assistants API) instead.

gwsin_0904 · March 18, 2024, 2:25am

Thanks for your reply. My use case is Sales Agent whose aim is to sell things like house, insurance to customers. In this case, it has some rules to follow, e.g, ask the customer to see the house when the customer shows some interest, and ask for the customer’s IM account to keep deeper communication when the customer has no interest. Currently, even though I provided similar example in training data, the SFT model still can’t follow the instructions in prompt. I’m a little confused and have no idea what to do to make it “smarter”…

Diet · March 18, 2024, 5:40am

Use GPT-4?

gwsin_0904 · March 18, 2024, 6:13am

No, my sft is based on GPT-3.5-turbo-0125

Diet · March 18, 2024, 6:22am

Well yeah, I’m saying maybe that’s the issue

gwsin_0904 · March 18, 2024, 7:04am

Yeah, but now GPT-4 sft api is not available for me.

Diet · March 18, 2024, 7:08am

You can generally get away without fine-tuning GPT-4, if your instructions are good enough.

gwsin_0904 · March 18, 2024, 7:16am

I have tried that but finally gave up that way. The first reason is that GPT-4’s response is too slow to meet our app’s requirement. The second is that there are still some cases that GPT-4 can’t follow our instructions… So, we tried GPT-3.5 and now we got approximate accuracy(about 90% good case) compared to GPT-4 through SFT. But out app requires almost 95% which now is blocked by some cases that can hardly fixed by adding some similar data or modify our prompt (Maybe the prompt can be better, but tuning has no direction that can surely lead us to better result)

Diet · March 18, 2024, 7:39am

I see.

The issue with gpt-3.5 CoT is that 3.5 generally struggles with reflection tasks. While fine-tuning might help the model initiate CoT more often, I don’t see how it can actually improve CoT outcomes.

Some of the issues you’ve mentioned can be mitigated to a degree:

GPT-4’s response is too slow: use streaming to improve the UX. If you need complex CoT, consider some form of asynchronous communication.
some cases that GPT-4 can’t follow our instructions: while these cases exist, they can often be engineered around. if you want to share your prompts, some of us here might be able to take a crack at it.

However, if you’ve already got 90% accuracy with 3.5, that’s pretty good. I wouldn’t throw that away. Is it possible to identify the remaining 10% where it will fail ahead of time?

Topic		Replies	Views
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1848	December 18, 2023
Gpt3 turbo not giving the good result even after fine-tuning API	14	1850	September 18, 2023
Having trouble to make AI avoid certain topics Prompting	13	3351	April 17, 2022
Finetuning for shortening prompts Documentation fine-tuning	10	3818	December 24, 2023
Finetuned a model, but it replies like insane API	7	1219	December 24, 2023

Is it helpful to add COT data in fine-tuning?

Related topics