Fine Tunes Question - Use FT or Prompt Engineering?

I’m trying to decide if I should build a fine-tuning system or build out a complex prompt engineering system. I realize the answer is often both, but ok, which one should I start with?

My content area is not well-taught - the LLM understands the domain, but it does not understand how to write the best responses. I am using RAG to populate the context with the relevant data for my prompts, so all the information is there.

I am dealing with a somewhat limited number of question-types, call it 20. Either solution will be somewhat laborious, but if we want amazing responses, I don’t see another way to do it.

My choices at this point:

  • Prompt engineering - to get the responses we want, with a very high data-density format, we’d basically have to tell the model how to write each sentence. It understands the domain, but the best answer to our questions has never been written, so it has no frame of reference (I believe).

Fine Tuning - This seems to be a more scalable method, but it would require the most work. Essentially we’d have to write out the perfect response for each of the call it 20 question types the system is likely to encounter.

My big question with fine tuning, is let’s say there’s only the following three question types:

  • How is A versus B?
  • How is A doing?
  • What are the properties of the best A’s?

If we fine tune for all those cases, will it perform as desired, choosing the appropriate fine-tuned response?

Which path is better? My gut take is that fine-tuning requires more effort, but it also seems like it would be more scalable, too (find a new question type, just create a new fine-tune).

What do you think? Thank you.