I have some questions regarding fine tuning.
I want to fine-tune the babbage base model on a custom dataset to generate text for a specific task which is generating a computer hardware products descriptions, hoping it produces an excellent results on my specific task.
Number of samples is around ~5k samples (product name in the prompt and a description in the output). After fine tuning, I noticed that the model is creating a lot of illogical text and is hallucinating.
My question is, do you think fine tuning is the right approach to generate good descriptions? Or should I depend on semantic search and embeddings? Is there any other approaches that you can think of that make more sense than fine tuning and semantic search? from my search, every one is recommending semantic search over fine tuning since fine tuning doesn’t make the model learn any thing new (if any one can confirm!) but the issue with semantic search is that it will make the prompt size so big and it will cost a lot. And I think it is good for QA and not product description generation, am I correct?
Another question, Is babbage good enough to generate good product descriptions without hallucinating? or it will always hallucinates regarding the size of the data or whatever I do?