I fine tuned 11-06 base - my dataset had examples with RAG retrieval
Now when I include RAG-obtained examples of good responses in the prompt, the fine tuned model tends to respond with what the example says verbatim (still occurs with temp=1).
Is solving this primarily a matter of providing enough dataset examples of ignoring irrelevant examples/responses, and varying responses ?
Also: anyone have more success fine tuning the already-fine-tuned model to improve issues like this?