with a very long system prompt you need a much higher temperature to avoid overfitting
system prompts are not adhered to very well, better to send the instructions in a user prompt and fake the assistant returning and appropriate response.
Depends on the model you are using in my experience. For gpt-3.5-turbo and the like, this sounds correct. For gpt-4 however, it responds really well to how information is passed off into its system message.
The long system prompt overfitting problem is sometimes desirable tbf. When you are generating code or want the output to be in line with the samples that you have provided, you’d prefer a lower temperature. Depends problem to problem
I’m a big fan of using long prompts to get more precise content. In my experience, the effect of temperature is manageable. This is because in addition to long and detailed explanatory prompts, I will use multiple levels of prompts to adjust.