Where I am coming from is that I am trying to rule common issues that can normally impede the functioning of the fine-tuned model. While we should consider the impact oft the learning rate multiplier, I am trying to first eliminate other potential issues.
If you have not set a temperature when using the fine-tuned model and the default value is used, then likely this is not a factor that contributes to the problem. However, to be on the safe side, you might want to consider explicitly setting the temperature with a low value (zero or close to zero).
The question regarding the user / system message was to ensure that you are including the same instructions (whether placed in a system or user message) that you used in your training data when using the fine-tuned model. Naturally, the code would be different but the instructions should be consistent. Stated differently, is there anything you do differently when you use the model compared to what was included in the training data?
What surprises me to hear is that the output format varies when you use your model. That should normally not be the case and I am trying to understand what could be causing this.
Finally, what was your training loss?