Optimal Temperature Setting for LLM Generation in RAG Model

I’m currently configuring a Retrieval-Augmented Generation (RAG) model that uses a large language model (LLM), and I’m trying to determine the best temperature setting for the generation process.

I understand that the temperature setting affects how deterministic or creative the model’s responses are. Lower temperatures make the output more focused and repetitive, while higher temperatures introduce more variability and creativity.

Could anyone share insights on:

  • What temperature setting works best for balancing coherent output and creativity in LLMs?
  • Are there recommended temperature ranges for different contexts, such as when you need more factual accuracy versus when a bit of creativity is acceptable?

I’d appreciate any guidance, including practical experiences with temperature tuning for RAG models using LLMs.

Thanks in advance!