Better Results When Dataset Includes RAG examples? Fine-tune already fine tuned model?

I fine tuned 11-06 base - my dataset had examples with RAG retrieval

Now when I include RAG-obtained examples of good responses in the prompt, the fine tuned model tends to respond with what the example says verbatim (still occurs with temp=1).

Is solving this primarily a matter of providing enough dataset examples of ignoring irrelevant examples/responses, and varying responses ?

Also: anyone have more success fine tuning the already-fine-tuned model to improve issues like this?